Data Mining and Knowledge Discovery

Year:
1st year
Semester:
S2
Programme main editor:
(I2CAT)
Onsite in:
UBB, UPC
Remote:
ECTS range:
6-7 ECTS

Professors

img
Professors
Anca Andreica
UBB

Prerequisites:

Algorithmics, data structures, statistics

Pedagogical objectives:

The course aims to present data mining and knowledge discovery concepts, methods and techniques.

Evaluation modalities:

The evaluation will be based on a project implementation, report presentations and/or written exam.

Description:

The students will learn various data analysis techniques and will apply these techniques for solving data mining problems using special software systems and tools.

Topics:

  • Introduction
  • Concept description and definitions
  • Data preparation
  • Discovering, ingesting, and exploring data
  • Transforming data into analytics-ready data
  • Association rules
  • Clustering
  • Classification
  • Data mining
  • Model assessment and validation

Complementary content:

  • Network analysis
  • Process Mining: Event Logs, Process Discovery, Conformance Checking, Log-based Verification (Toolset: ProM, Disco and Celonis)
  • Data Warehousing: ETL Process, Data Warehouse Components & Architecture, Multi-dimensional data model, ROLAPS

Required teaching material

• S. Chakrabarti et al, Data Mining. Know It All, Morgan Kaufmann, 2009. • K. Cios, W. Pedrycz, R. Swiniarski, L. Kurgan, Data Mining. A Knowledge Discovery Approach, Springer, 2007. • J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006. • P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006. • D. Larose, Discovering Knowledge in Data. An Introduction to Data Mining, John Wiley & Sons, 2005. • Han, J., Kamber, M., Data Mining: Concepts and Techniques, 1st Edition, Morgan Kaufmann, 2000. • Weka system and documentation (http://www.cs.waikato.ac.nz/ml/weka/). • A. Géron. Hands-on machine learning with scikit-learn & tensorflow : concepts, tools, and techniques to build intelligent systems. Sebastopol, CA: O'Reilly Media, Inc, 2017. ISBN 9781491962299. • H. Mohanty, P. Bhuyan, D. Chenthati, Deepak. Big Data : A Primer. New Delhi: Springer India, 2015, ISBN 9788132224945. • J. Leskovec, A. Rajaraman, J.D. Ullman. Mining of massive datasets, 2nd ed. New York, N.Y. ; Cambridge University Press, 2014. ISBN 9781107077232. • R. Garreta, G. Moncecchi, Guillermo. Learning scikit-learn : machine learning in Python. Birmingham: Packt Publishing, 2013. ISBN 978178328193 Tools: • Data Mining: RapidMiner • Process Mining: ProM, Disco, Celonis

Teaching volume:
lessons:
0-28 hours
Exercices:
Supervised lab:
14-54 hours
Project:
0-14 hours

Devices:

  • Laboratory-Based Course Structure
  • Open-Source Software Requirements