Skip to content

Data-engineering-101

Posted on:March 15, 2025 at 01:20 AM

Data engineering

This blog contains some commonly use terms in world of data engineering.

Types of data source
- Structured data source: Data organized as tables of rows and columns.

- Semi-structured data source: Data that is not in tabular form but still have some structure. Ex - JSON, XML

- Unstructured data source: Data that does not have any pre-defined structure. Ex -text, video, audio, images, etc

Types of source system
- Databases: Store data in an organized way, structured or semi-structured

- Files: Sequence of bytes representing information TXT, png, mp3, csv etc

- Streaming system - Continuous flow of data, semi structured data. Eg- IOT sensor

ACID properties
- Atomicity: It ensures that transactions are treated as single individual unit.

- Consistency:Any changes to the data made within a transaction follow the set of rules or constraints defined by database schema.

- Isolation: Each transaction is executed in sequential order.

- Durability: Once a transaction is completed, its effects are permanent and will survive subsequent system failures.

To be continued..