en_GB
Hold Ctrl-tasten nede. Trykk på + for å forstørre eller - for å forminske.

DAT500_1

Data-intensive Systems

This is the study programme for 2019/2020. It is subject to change.


The course will provide a strong basis in administrative, programing, and algorithm design aspects of data intensive systems.

Learning outcome

Knowledge:
  • Characterize Hadoop job tracker, task tracker, scheduling issues, communications, and resource management; - Describe elements of Hadoop ecosystem and identify their applicability; - Describe and compare RDBMS, data warehouse, unstructured big data, and keyed files, and show how to apply them to typical data processing problems; - Understand algorithmic complexity of the worst case, expected case, and best case running time, and the orders of complexity; apply the analysis to real life algorithms;

Skills:
  • Design, construct, test, and benchmark a small data processing cluster (based on Hadoop); - Analyze real-life problems and propose suitable solutions; - Construct programs based directly on MapReduce paradigm for typical problems; - Construct programs based on high-level tools (for MapReduce paradigm) for typical problems; - Analyze influence of peak and sustained bandwidth rate on system performance;

General qualifications:
  • Evaluate, communicate and defend a data-intensive solution w.r.t. relevant criteria.

Contents

The emergence of Big Data and Data Intensive Systems as specialized fields in computing is motivating development of new techniques and technologies needed to extract knowledge from large datasets. Since Hadoop was conceived in 2005, popular interest in data intensive systems began to grow. This course is a first step to a variety of roles related to data-intensive systems, the core tasks in these roles that we will address are: system administration (setup, test and benchmark a cluster), low-level algorithm design and implementation (direct implementation of MapReduce jobs), high-level algorithm design and implementation (utilizing one of data processing languages e.g. HiveQL), data modelling and algorithm optimisation, reliable infrastructure design for data collection and processing, advocating technology application both in technical and non-technical setting, providing introductory training to coworkers.

Required prerequisite knowledge

None.

Exam

Weight Duration Marks Aid
Project work with oral presentation1/1 A - FAll.
Project with oral presentation. Project is completed in groups.
Both parts must be done before final grade is given. Each group member can receive a different grade based on their performance during the oral examination.
If a student fails the projectwork , she/he have to take this part again next time the subject is lectured.

Coursework requirements

Four assignments
Students start with 4 mandatory assignments that contain program and system administration. Obligatory assignments are to be completed individually. All mandatory assignments must be passed within deadline so that student has the right to start with the project.
Completion of mandatory lab assignments is to be made at the times and in the groups that are assigned and published. Absence due to illness or for other reasons must be communicated as soon as possible to the laboratory personnel. One cannot expect that provisions for completion of the lab assignments at other times are made unless prior arrangements with the laboratory personnel have been agreed upon.

Course teacher(s)

Course coordinator
Tomasz Wiktorski
Head of Department
Tom Ryen

Method of work

The work will consist of 6 hours of lecture, scheduled laboratory, supervised group work per week. Students are expected to spend additional 6-8 hours a week on self-study, group discussions, and development work (open laboratory).

Open to

Bachelor- and masterstudents at the Faculty of Science and Technology.

Course assessment

Form and/or discussion.

Literature

Wiktorski, T. (2018). Data-intensive Systems. Principles and Fundamentals using Hadoop and Spark. "SpringerNature", 978-3-030-04602-6
White, T. (2012). Hadoop: The definitive guide. "O'Reilly Media, Inc.". 978-1-449-31152-0


This is the study programme for 2019/2020. It is subject to change.

Sist oppdatert: 13.11.2019

History