EPN

DATA3800 Introduction to Data Science with Scripting Emneplan

Engelsk emnenavn
Introduction to Data Science with Scripting
Studieprogram
Bachelorstudium i ingeniørfag - data / Bachelorstudium i informasjonsteknologi / Bachelorstudium i anvendt datateknologi / Bachelorstudium i ingeniørfag – matematisk modellering og datavitenskap
Omfang
10.0 stp.
Studieår
2023/2024
Timeplan
Emnehistorikk

Innledning

Data is the new oil, powering industries, putting into motion trillion Euro companies and supporting governments to take decisions that affect the lives and the well fare of bilions of people around the world. But to do so, data must be refined, properly analized, and presented so relevant decision makers can make sense of it, and use it in a manner that delivers value to society. Data Science is the field of study that focus on collecting, organizing, cleaning, understanding, transforming, using and presenting data so it becomes useful.

In this course you are going to learn what is Data Science, and how do we approach problems in Data Science so it can contribute towards a sustainable future. We will briefly question some common ideas we may have about what science is and how we do scientific research. We will address what makes a research method suitable or not focusing on specific cases to learn from successes and disasters in the history of Data Science.

You will learn the methods, potentials and limits of Data Science as well as how to apply them to real world challenges using a scripting language (Python, Matlab or R). The course is designed to provide a solid theoretical introduction to the subject and build the foundational skill through hands-on experience. To achieve that, you will use open data-sources to develop a data science project from data-collection to insight presentation.

Anbefalte forkunnskaper

Basic algebra, basic mathematical analysis and statistics are highly recommended, though a short overview on the fundamentals of these topics will be provided. The course will have a practical part using codes in python, Matlab or R. Acquaintance with these programming languages is not required, but some experience with a similar programming language is also recommended.

Læringsutbytte

After completing this course, the student should have the following learning outcome:

Knowledge

Upon successful completion of the course, the candidate will have the knowledge of:

  • the most commonly used methods in data science to clean, imputate, analyse and present data;
  • the context in which these methods should be applied;
  • the specific cautions and pitfalls that should be taken into account through the entire research process, particularly when using tools from statistical analysis.
  • practical data problems in different fields of science, ranging from fundamental and natural sciences to social sciences and engineering.
  • how statistical analysis can be used for uncovering the features and properties of a specific set of data.
  • the main features and techniques one should be aware of for data collection.
  • programming languages applicable to data analysis and modelling.

Skills

Upon successful completion of the course, the candidate will be able to:

  • use a scpriting programming language to perform basic data science operations
  • translate problems into research questions and evaluate it is soundness
  • propose a first design of experiments to approach specific research questions.
  • have a critical insight about the quantitative analysis presented in a research question, approaching authors’ interpretation about the presented results, e.g. in what concerns the correlation between different variables, their possible functional relations and the statistical significance of the overall results.
  • develop a computer framework to generate surrogate data sets with particular statistical features, as numerical experiments for testing specific data models.
  • apply statistical analysis and mathematical modelling techniques on data from their field of study.

General competences 

Upon successful completion of the course, the student

  • will be able to construct and establish a research plan
  • will be able to create a data analysis pipeline where data is refined and transformed through scripts
  • will be able to carry out the basic quantitative analysis of its results

will have a critical understanding of the limitations and possibilities in big datasets and statistical analysis  

Arbeids- og undervisningsformer

This course will feature lectures and lab work to provide both theoretical and hands-on content. Students will work in groups or individually and complete assignments given to them. The students will supplement the lectures and lab with reading of recommended literature.

Arbeidskrav og obligatoriske aktiviteter

The following coursework is compulsory and must be approved before the student can take the exam:

Mandatory assignment 1: Students will select an open dataset and a research problem of their preference, study it carefully in the light of scientific literature and submit a text (300-500 words) explaining the reasons for their choice and how it could be used to create value to society or support an existing of future business.

Mandatory assignment 2: Building upon assignment 1, students will create and present a data analysis pipeline using the chosen dataset and self-selected problem. The pipeline should be implemented in code using the data analysis and scripting techniques taught during the course.

Vurdering og eksamen

This is a portfolio exam that consists of a report based on the data analysis pipeline developed in the mandatory assignments, and its respective results.

The portfolio will consist of two parts; a report and a presentation:

  1. The report is a careful description of the work done during the semester. The report should contain a set of codes, graphs and notes, together with a sample of the dataset. 
  2. 20 minutes maximum presentation of the content presented in the report within a coherent narrative and clarifying any obscure steps in the data processing, analysis, results or conclusions. 

The portfolio will be assessed as a whole.

The exam result can not be appealed.

Hjelpemidler ved eksamen

All support materials are allowed for both the oral presentation and for the individual written summary.

Vurderingsuttrykk

The final assessment will be graded on a grading scale from A to E (A is the highest grade and E the lowest) and F for fail.

Sensorordning

Two examiners will be used, one of which can be external. External examiner is used regularly.

Emneoverlapp

Overlapps 90% with STKD6060