Percorso in Introduction to Data Mining
The Introduction to Data Mining pathway teaches you how to use the data mining methodology to analyze both structured, semi-structured and un-structured data. The pathway consists of the following courses; Classification, Clustering and Association, and Text Mining. You will learn how to develop data mining workflows using the KNIME open source software platform. You are not required to code any programs while KNIME allows you to use open source programming languages and powerful commercial software environments; R, Weka, Matlab, Python, Java, ... and to access data from powerful platforms such as Twitter and Google.
Percorso in Introduzione al Data Mining
Descrizione del Corso
Descrizione nella lingua del Corso:This course introduces basic concepts and methods of Data Mining with specific reference to Classification. In particular, the course will provide the following general concepts; data type, summarization, missing data replacement, and data pre-processing. Classification will be introduced together with the following concepts; explanatory and class attribute, train and test data set, classifier (learner and inducer), performance measures (accuracy, error, precision and recall), k-folds cross validation, overfitting and underfitting, curse of dimensionality, cost matrix, receiver operating characteristic curve, lift and cumulative gains charts, not relevant/redundant attributes, and features selection. The following classification models are described; decision trees, logistic regression, support vector machines, multi layer perceptron, naïve Bayes, tree augmented naïve Bayes and Bayesian classifiers. The course is self-contained as much as possible, and it does not require any programming skills. Indeed, the KNIME open source software platform, which exploits the concept of graphical workflow, is used to mine datasets consisting of different data types.
Descrizione in inglese:Learn how to formulate and solve classification problems for use in Data Mining and Business Intelligence applications such as; fraud detection, customer churning, network intrusion detection, ... You will learn how to develop, validate and apply a data mining workflow to solve binary and non-binary classification problems. The course is self-contained, and it does not require any programming skills. Hands-on lectures are based on the KNIME open source software platform.
Descrizione del Corso
Descrizione nella lingua del Corso:This course introduces basic concepts and methods of Data Mining with specific reference to Clustering and Association Rules. We present concept and purposes of cluster analysis, together with its’ main components. Partitioning, hierarchical, density based, and graph based clustering methods are described. Particular attention is devoted to; cluster validity measures and clustering validation. The last part of the course introduces association rule discovery. The concepts of association rule, frequent itemset, support and confidence are given. Furthermore, we give a brief description of the Apriori algorithm for frequent itemset generation, and introduce the concepts of maximal and closed frequent itemset. Finally, different criteria, for evaluating the quality of association patterns, are introduced.
Descrizione in inglese:Learn how to formulate and solve Clustering problems and Association Rule extraction problems for use in Data Mining and Business Intelligence applications. Clustering and Association Rule extraction are potentially interesting to solve problems as; store layout, customer profiling, targeted marketing, market basket analysis, ... You will learn how to develop, validate and apply Data Mining workflows to solve clustering and association rule extraction problems. The course is self-contained and hands-on lectures are based on the KNIME open source software platform.
You should attend the three courses of the Pathway in the following order: Classification, Clustering and Association, and Text Mining. However, if you know how to use the KNIME open source platform, have basic knowledge of the R programming language, then, no special order applies.
Evaluation and Certificates
Each course of the Pathway issues an Attendance Certificate and a Badge whether the following conditions are fulfilled: all practice sessions associated with each lecture are accomplished; the KNIME workflow associated with each practice session is uploaded to the course platform.