De-mystified data lingo

 

 

 

 

 

 

 

Before the “data guys” get too carried away, it’s worth familiarising yourself with some of the most common terms, which we have put together below. Links provided for more in-depth explanation of the more complex terms, such as neural networks.

TermWhat is it?Area
Data WarehouseCentral repository of data from disparate sources, generally used for management reportsData Management
Data LakeRepository of raw data that can consist of structured/ unstructured dataData Management & Manipulation
UpstreamA data source that precedes the immediate one - i.e. the Data Warehouse is upstream from the management reportData Management & Manipulation
SchemaThe underlying structure of your database/ datasetData Management & Manipulation
DownstreamData Management & Manipulation
Data cleansingCorrecting or removing inaccurate data Data Management & Manipulation
Data Management PlatformData Warehouse that sucks up and sorts information, and spits it out in a way that’s useful for marketers, publishers
Data Management & Manipulation
ETLExtract, Transform, Load - these 3 functions are combined to move data from one source to another.Data Management & Manipulation
Structured data Data organised logically in a SQL relational databaseData Management & Manipulation
Unstructured dataA mass of text & non text data, not relational. Think social media postsData Management & Manipulation
Postgres, RedShift, MySQL, SQL Server*, Oracle*
Open source and paid SQL database platforms (*= paid)Data Management & Manipulation
AlgorithmA list of rules composed logically to solve a problem. Can consist of different steps split into data manipulation/ modelingData Science
OutlierA value outside of the normal range of distributionData Science
Type 1 ErrorAlso known as false positives - when you predict that an outcome will occur and it does notData Science
Type 2 ErrorData Science
Predictive AnalyticsThe use of data/ statistics/ machine learning to analyse current data & predict future outcomesData Science
Machine LearningA branch of computer science that enables computers to learn without being explicitly programmedData Science
Neural NetworkData Science
Eucladean distance
Data Science
RegressionData Science
ClassificationData Science
Ensemble ModelingData Science
K Fold X ValidationData Science
ClusteringData Science
Statistical SignificanceData Science

Leave a Reply

Your email address will not be published. Required fields are marked *