Studieinfo emne DATA3800 2024 HØST
DATA3800 Introduction to Data Science with Scripting Emneplan
 Engelsk emnenavn
 Introduction to Data Science with Scripting
 Studieprogram

Bachelorstudium i informasjonsteknologi / Bachelorstudium i ingeniørfag – matematisk modellering og datavitenskap / Bachelorstudium i anvendt datateknologi / Bachelorstudium i ingeniørfag  data
 Omfang
 10 stp.
 Studieår
 2024/2025
 Pensum

HØST
2024
 Timeplan
 Programplan
 Emnehistorikk

Innledning
Data is the new oil, powering industries, putting into motion trillion Euro companies and supporting governments to take decisions that affect the lives and the well fare of bilions of people around the world. But to do so, data must be refined, properly analized, and presented so relevant decision makers can make sense of it, and use it in a manner that delivers value to society. Data Science is the field of study that focus on collecting, organizing, cleaning, understanding, transforming, using and presenting data so it becomes useful.
In this course you are going to learn what is Data Science, and how do we approach problems in Data Science so it can contribute towards a sustainable future. We will briefly question some common ideas we may have about what science is and how we do scientific research. We will address what makes a research method suitable or not focusing on specific cases to learn from successes and disasters in the history of Data Science.
You will learn the methods, potentials and limits of Data Science as well as how to apply them to real world challenges using a scripting language (Python, Matlab or R). The course is designed to provide a solid theoretical introduction to the subject and build the foundational skill through handson experience. To achieve that, you will use open datasources to develop a data science project from datacollection to insight presentation.
Anbefalte forkunnskaper
Basic algebra, basic mathematical analysis and statistics are highly recommended, though a short overview on the fundamentals of these topics will be provided. The course will have a practical part using codes in python, Matlab or R. Acquaintance with these programming languages is not required, but some experience with a similar programming language is also recommended.
Læringsutbytte
After completing this course, the student should have the following learning outcome:
Knowledge
Upon successful completion of the course, the candidate will have the knowledge of:
 the most commonly used methods in data science to clean, imputate, analyse and present data;
 the context in which these methods should be applied;
 the specific cautions and pitfalls that should be taken into account through the entire research process, particularly when using tools from statistical analysis.
 practical data problems in different fields of science, ranging from fundamental and natural sciences to social sciences and engineering.
 how statistical analysis can be used for uncovering the features and properties of a specific set of data.
 the main features and techniques one should be aware of for data collection.
 programming languages applicable to data analysis and modelling.
Skills
Upon successful completion of the course, the candidate will be able to:
 use a scpriting programming language to perform basic data science operations
 translate problems into research questions and evaluate it is soundness
 propose a first design of experiments to approach specific research questions.
 have a critical insight about the quantitative analysis presented in a research question, approaching authors’ interpretation about the presented results, e.g. in what concerns the correlation between different variables, their possible functional relations and the statistical significance of the overall results.
 develop a computer framework to generate surrogate data sets with particular statistical features, as numerical experiments for testing specific data models.
 apply statistical analysis and mathematical modelling techniques on data from their field of study.
General competences
Upon successful completion of the course, the student
 will be able to construct and establish a research plan
 will be able to create a data analysis pipeline where data is refined and transformed through scripts
 will be able to carry out the basic quantitative analysis of its results
will have a critical understanding of the limitations and possibilities in big datasets and statistical analysis
Arbeids og undervisningsformer
This course will feature lectures and lab work to provide both theoretical and handson content. Students will work in groups or individually and complete assignments given to them. The students will supplement the lectures and lab with reading of recommended literature.
Arbeidskrav og obligatoriske aktiviteter
The following coursework is compulsory and must be approved before the student can take the exam:
Mandatory assignment 1: Students will select an open dataset and a research problem of their preference, study it carefully in the light of scientific literature and submit a text (300500 words) explaining the reasons for their choice and how it could be used to create value to society or support an existing of future business.
Mandatory assignment 2: Building upon assignment 1, students will create and present a data analysis pipeline using the chosen dataset and selfselected problem. The pipeline should be implemented in code using the data analysis and scripting techniques taught during the course.
Vurdering og eksamen
This is a portfolio exam that consists of a report based on the data analysis pipeline developed in the mandatory assignments, and its respective results.
The portfolio will consist of two parts; a report and a presentation:
 The report is a careful description of the work done during the semester. The report should contain a set of codes, graphs and notes, together with a sample of the dataset.
 20 minutes maximum presentation of the content presented in the report within a coherent narrative and clarifying any obscure steps in the data processing, analysis, results or conclusions.
The portfolio will be assessed as a whole.
In case of a new or postponed examination, an alternative examination format may be used. Oral presentation can’t be appealed.
Hjelpemidler ved eksamen
All support materials are allowed for both the oral presentation and for the individual written summary.
Vurderingsuttrykk
The final assessment will be graded on a grading scale from A to E (A is the highest grade and E the lowest) and F for fail.
Sensorordning
Two examiners will be used, one of which can be external. External examiner is used regularly.
Emneoverlapp
Overlapps 90% with STKD6060