Programplaner og emneplaner - Student
ACIT4530 Data Mining at Scale: Algorithms and Systems Emneplan
- Engelsk emnenavn
- Data Mining at Scale: Algorithms and Systems
- Studieprogram
-
Master's Programme in Applied Computer and Information Technology
- Omfang
- 10.0 stp.
- Studieår
- 2021/2022
- Pensum
-
VÅR 2022
- Timeplan
- Emnehistorikk
-
Innledning
We are witnessing the era of Big data where data is generated, collected, and processed at an unprecedented scale and data-driven decisions influence many aspects of modern life.
Data mining is the process of discovering patterns in large data sets involving methods in statistics and database systems.
A large number of applications such IoT sensors generate large amounts of data streams. The necessity of data stream mining and learning from the data is increasingly becoming more prevalent and urgent.
Extracting knowledge from data sets requires not only computational power but also programming abstractions as well as analytical skills. In this course, the students will be exposed to the different approaches for data mining and stream processing such as associationrule learning, anomaly detection, data clustering, visualizations, and extracting statistical features on the fly from large data streams. In this course, the student will also be exposed to different data mining systems including the landscape of MapReduce and the ecosystem it spawned, such as Spark and its contemporaries. With a focus on data mining applications, we will study some powerful numerical linear algebra methods.
Anbefalte forkunnskaper
An individual project report approximately 2500 - 5000 words, excluding appendixes.
The exam can be appealed,
;
New/postponed exam
In case of failed exam or legal absence, the student may apply for a new or postponed exam. New or postponed exams are offered within a reasonable time span following the regular exam. The student is responsible for applying for a new/postponed exam within the time limits set by OsloMet. The Regulations for new or postponed examinations are available in Regulations relating to studies and examinations at OsloMet.
Forkunnskapskrav
No formal requirements over and above the admission requirements.
Læringsutbytte
The student should have the following outcomes upon completing the course:
Knowledge
Upon successful completion of the course, the student:
- has a deep understanding of how data mining can be used to extract knowledge from data sets.
- has advanced knowledge of the different data mining algorithms.
- should be able to use data mining systems to mine data.
Skills
Upon successful completion of the course, the student:
- can design and implement data mining algorithms
- can deploy different data mining systems and configure them
- can utilize a specialized library for data mining
General competence
Upon successful completion of the course, the student:
- can analyse data mining solutions with regard to robustness and in relation to his/her intended tasks
- can explain how data mining can be used in different applications areas such as business analytics
Innhold
All aids are permitted.
Arbeids- og undervisningsformer
This course is divided into two parts. The first part with focus on covering the principles of data mining and stream processing. Different seminars will be given on the different methodological aspects of data mining and stream processing as well as the programming paradigms and software tools that enable them.
The second part will focus on the students completing a programming project. The project can be chosen from a portfolio of available problems. The student will work in a group on the project and submit a final code-base with a report.
During this part, there may be lectures if needed, but most of the time will be spent on individual supervision of students in lab-sessions.
Practical training
Lab sessions.
Arbeidskrav og obligatoriske aktiviteter
None.
Vurdering og eksamen
The course covers the foundations and recent advances in Machine Learning from the point of view of Statistical Learning Theory. The goal of this course is to provide students with the practical skills to support the theoretical knowledge acquired during the lecture course and the practical intuitions needed to use and develop effective machine learning solutions to challenging problems.
Access to good statistical/data analysis software is paramount. Therefore, we will illustrate the use of the models throughout the course with real implementation.
Hjelpemidler ved eksamen
No formal requirements over and above the admission requirements.
Vurderingsuttrykk
The student should have the following outcomes upon completing the course:
Knowledge
Upon successful completion of the course, the student:
- will have a good understanding the different concepts and methods of supervised and unsupervised statistical learning and how to apply them on large data.
- has advanced knowledge of probabilistic formulation of the various learning problems
;
Skills
Upon successful completion of the course, the student:
- can apply different high-dimensional regression techniques on data
- can apply different classification techniques on data
- can apply clustering techniques on data
- can derive learning algorithms for new models and analyze new data with them.
- can apply dimensionality reduction techniques on data
;
General competence
Upon successful completion of the course, the student:
- can apply different predictive models on data and assess their performance
- can use supervised and unsupervised learning in different real life problem
Sensorordning
This course is divided into two parts. The first part with focus on covering the principles of Statistical Learning. Different seminars will be given on the different methodological aspects of Statistical learning, mainly, supervised learning and unsupervised learning.
The second part will focus on the students completing a programming project. This is a real data analysis problem, where the student is asked to carry out the analysis using the tools and techniques from the course and hand in a report documenting the steps he has taken in the analysis. The ultimate goal is to build a predictive model.
The project report will consist of at least 25 pages and max 60 pages.
During this part, there may be lectures if needed, but most of the time will be spent on individual supervision of students in lab-sessions.
Practical training
Lab sessions.
Emneansvarlig
The following required coursework must be approved before the student can take the exam:
One mandatory assignment: A project plan document containing a description of the chosen data set, a preliminary research question and suggested tools and method to apply.