Data Mining and Knowledge Discovery
Professors
Prerequisites:
Algorithmics, data structures, statistics
Pedagogical objectives:
The course aims to present data mining and knowledge discovery concepts, methods and techniques.
Evaluation modalities:
The evaluation will be based on a project implementation, report presentations and/or written exam.
Description:
The students will learn various data analysis techniques and will apply these techniques for solving data mining problems using special software systems and tools.
Topics:
- Introduction
- Concept description and definitions
- Data preparation
- Discovering, ingesting, and exploring data
- Transforming data into analytics-ready data
- Association rules
- Clustering
- Classification
- Data mining
- Model assessment and validation
Complementary content:
- Network analysis
- Process Mining: Event Logs, Process Discovery, Conformance Checking, Log-based Verification (Toolset: ProM, Disco and Celonis)
- Data Warehousing: ETL Process, Data Warehouse Components & Architecture, Multi-dimensional data model, ROLAPS
• S. Chakrabarti et al, Data Mining. Know It All, Morgan Kaufmann, 2009. • K. Cios, W. Pedrycz, R. Swiniarski, L. Kurgan, Data Mining. A Knowledge Discovery Approach, Springer, 2007. • J. Han, M. Kamber, Data Mining: Concepts and Techniques, 2nd Edition, Morgan Kaufmann, 2006. • P. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison Wesley, 2006. • D. Larose, Discovering Knowledge in Data. An Introduction to Data Mining, John Wiley & Sons, 2005. • Han, J., Kamber, M., Data Mining: Concepts and Techniques, 1st Edition, Morgan Kaufmann, 2000. • Weka system and documentation (http://www.cs.waikato.ac.nz/ml/weka/). • A. Géron. Hands-on machine learning with scikit-learn & tensorflow : concepts, tools, and techniques to build intelligent systems. Sebastopol, CA: O'Reilly Media, Inc, 2017. ISBN 9781491962299. • H. Mohanty, P. Bhuyan, D. Chenthati, Deepak. Big Data : A Primer. New Delhi: Springer India, 2015, ISBN 9788132224945. • J. Leskovec, A. Rajaraman, J.D. Ullman. Mining of massive datasets, 2nd ed. New York, N.Y. ; Cambridge University Press, 2014. ISBN 9781107077232. • R. Garreta, G. Moncecchi, Guillermo. Learning scikit-learn : machine learning in Python. Birmingham: Packt Publishing, 2013. ISBN 978178328193 Tools: • Data Mining: RapidMiner • Process Mining: ProM, Disco, Celonis
Devices:
- Laboratory-Based Course Structure
- Open-Source Software Requirements