Data
1. What is Data?
Data is a set of information or characteristics that are collected through observation. In various industries, many forms of data are continuously being collected.
- Retail and finance (sales, revenue, profit, stock price data)
- Manufacturing (defect rate, production rate, energy consumption rate data)
- Social studies (homeless rate, crime rate, unemployment rate, literacy rate data)
2. Types of Data
Classification by data type
Structured Data
Structured data is data or files stored in a formatted repository, such as spreadsheets or tables in relational database systems. Structured data is supported by the schema structure, and thus follows “table search, column search, row search” in a searching process. This type of data usually exists in an internal system and has a designated internal format, so it is relatively easy to collect.
- RDBMS table
- Spreadsheet
Semi-structured Data
Semi-structured data is data in a file format that consists of metadata, which is a characteristic of structured data. It is important to understand the data structure to successfully parse data. This type of data is often available in API format and requires some degree of data processing techniques. It also requires some modifications to its data architecture to have it transformed into structured data.
- HTML in URL links
- XML, JSON from API format
- Web logs, IOT sensor data
Unstructured Data
Unstructured Data is a type of data that does not have a pre-defined format or not organized in a predefined manner. Texts, images, and videos are examples of unstructured data. HTML Data may also be classified as semi-structured data, but it is difficult to accurately classify because data is collected through text mining in some cases. These data types require parsing and converting the data set into a meta-structure. Processing unstructured data is relatively challenging, as it also needs data architect modifications to organize the data into a structured format.
- Images and videos in binary file format
- Texts in script file format
Classification by data location
Internal Data
Internal Data is a type of data that has its original data stored in an internal system. As the original data and the collected data are both stored internally, data communication is easier and has relatively fewer technical restrictions
External Data
External Data is a type of data that has its original data stored in an external system. As the original data is stored outside, data communication requires consent from the data provider. Collecting data also requires analyzing the collection cycle and methods.