COMP63301 Data Engineering Concepts

Data engineering tends to involve a lifecycle, in which typical phases include data acquisition, profiling, cleaning, integration, modelling, and usage.

This unit introduces the student to relevant stages of the data engineering lifecycle and related concepts, tasks and techniques. It deepens selected aspects of this lifecycle, e.g., transformation, modelling and visualisation, and addresses cross-cutting topics such as security, trust, and robustness. We will investigate pain points, trade-offs, limitations and evaluation criteria that can inform the development of data engineering pipelines in practice.

This is a draft handbook to accompany the teaching of COMP63301 at the Department of Computer Science at The University of Manchester.

Reading list

The reading list for COMP63301 includes:

Related publications include:

Aims

The unit aims to provide students with an understanding of the concepts that underpin data engineering and the experience of applying those concepts. In turn, data engineering provides processes and mechanisms that enable value to be obtained from data. These processes and mechanisms can be considered to give rise to a data engineering lifecycle, and this unit explores the concepts that underpin the different stages in such a lifecycle, which include data transformation and visualisation.

Intended Learning Outcomes

  1. Explain the Data Engineering (DE) lifecycle, related concepts, challenges and research questions.

  2. Identify relevant data properties, understanding the shape of data and its representation of the world.

  3. Apply selected DE techniques for data integration, cleaning, transformation and visualisation, ensuring data quality for the purpose of data analysis.

  4. Critically analyse data engineering technologies.

  5. Discuss trade-offs between various design options.