
Before the “data guys” get too carried away, it’s worth familiarising yourself with some of the most common terms, which we have put together below. Links provided for more in-depth explanation of the more complex terms, such as neural networks.
| Term | What is it? | Area |
|---|---|---|
| Data Warehouse | Central repository of data from disparate sources, generally used for management reports | Data Management |
| Data Lake | Repository of raw data that can consist of structured/ unstructured data | Data Management & Manipulation |
| Upstream | A data source that precedes the immediate one - i.e. the Data Warehouse is upstream from the management report | Data Management & Manipulation |
| Schema | The underlying structure of your database/ dataset | Data Management & Manipulation |
| Downstream | Data Management & Manipulation | |
| Data cleansing | Correcting or removing inaccurate data | Data Management & Manipulation |
| Data Management Platform | Data Warehouse that sucks up and sorts information, and spits it out in a way that’s useful for marketers, publishers | Data Management & Manipulation |
| ETL | Extract, Transform, Load - these 3 functions are combined to move data from one source to another. | Data Management & Manipulation |
| Structured data | Data organised logically in a SQL relational database | Data Management & Manipulation |
| Unstructured data | A mass of text & non text data, not relational. Think social media posts | Data Management & Manipulation |
| Postgres, RedShift, MySQL, SQL Server*, Oracle* | Open source and paid SQL database platforms (*= paid) | Data Management & Manipulation |
| Algorithm | A list of rules composed logically to solve a problem. Can consist of different steps split into data manipulation/ modeling | Data Science |
| Outlier | A value outside of the normal range of distribution | Data Science |
| Type 1 Error | Also known as false positives - when you predict that an outcome will occur and it does not | Data Science |
| Type 2 Error | Data Science | |
| Predictive Analytics | The use of data/ statistics/ machine learning to analyse current data & predict future outcomes | Data Science |
| Machine Learning | A branch of computer science that enables computers to learn without being explicitly programmed | Data Science |
| Neural Network | Data Science | |
| Eucladean distance | Data Science | |
| Regression | Data Science | |
| Classification | Data Science | |
| Ensemble Modeling | Data Science | |
| K Fold X Validation | Data Science | |
| Clustering | Data Science | |
| Statistical Significance | Data Science | |