Data

DSLAB GLOBAL
2 min readMar 25, 2021

--

1. What is Data?

Data is a set of information or characteristics that are collected through observation. In various industries, many forms of data are continuously being collected.

  • Retail and finance (sales, revenue, profit, stock price data)
  • Manufacturing (defect rate, production rate, energy consumption rate data)
  • Social studies (homeless rate, crime rate, unemployment rate, literacy rate data)

2. Types of Data

Classification by data type

Structured Data

Structured data is data or files stored in a formatted repository, such as spreadsheets or tables in relational database systems. Structured data is supported by the schema structure, and thus follows “table search, column search, row search” in a searching process. This type of data usually exists in an internal system and has a designated internal format, so it is relatively easy to collect.

  • RDBMS table
  • Spreadsheet

Semi-structured Data

Semi-structured data is data in a file format that consists of metadata, which is a characteristic of structured data. It is important to understand the data structure to successfully parse data. This type of data is often available in API format and requires some degree of data processing techniques. It also requires some modifications to its data architecture to have it transformed into structured data.

  • HTML in URL links
  • XML, JSON from API format
  • Web logs, IOT sensor data

Unstructured Data

Unstructured Data is a type of data that does not have a pre-defined format or not organized in a predefined manner. Texts, images, and videos are examples of unstructured data. HTML Data may also be classified as semi-structured data, but it is difficult to accurately classify because data is collected through text mining in some cases. These data types require parsing and converting the data set into a meta-structure. Processing unstructured data is relatively challenging, as it also needs data architect modifications to organize the data into a structured format.

  • Images and videos in binary file format
  • Texts in script file format

Classification by data location

Internal Data

Internal Data is a type of data that has its original data stored in an internal system. As the original data and the collected data are both stored internally, data communication is easier and has relatively fewer technical restrictions

External Data

External Data is a type of data that has its original data stored in an external system. As the original data is stored outside, data communication requires consent from the data provider. Collecting data also requires analyzing the collection cycle and methods.

--

--