In corso

Iscriviti adesso

Percorso in Introduction to Data Mining

Overview


The Introduction to Data Mining pathway teaches you how to use the data mining methodology to analyze both structured, semi-structured and un-structured data. The pathway consists of the following courses; Classification, Clustering and Association, and Text Mining. You will learn how to develop data mining workflows using the KNIME open source software platform. You are not required to code any programs while KNIME allows you to use open source programming languages and powerful commercial software environments; R, Weka, Matlab, Python, Java, ... and to access data from powerful platforms such as Twitter and Google.

Percorso in Introduzione al Data Mining

Descrizione




Lingua
Inglese
Categoria
Informatica, Gestione e Analisi Dati
Iscrizione
da 4 Apr 2016
Attivo
da 22 Apr 2016

Corsi


Stato
Tutoraggio Soft
Durata
4 settimane
Impegno
11 ore/settimana
Categoria
Informatica, Gestione e Analisi Dati
Lingua
Inglese
Tipo
Online
Livello
Base

Partecipazione e Attestati

Quota di iscrizione
GRATUITO!

Descrizione del Corso

Descrizione nella lingua del Corso:

This course introduces basic concepts and methods of Data Mining with specific reference to Classification. In particular, the course will provide the following general concepts; data type, summarization, missing data replacement, and data pre-processing. Classification will be introduced together with the following concepts; explanatory and class attribute, train and test data set, classifier (learner and inducer), performance measures (accuracy, error, precision and recall), k-folds cross validation, overfitting and underfitting, curse of dimensionality, cost matrix, receiver operating characteristic curve, lift and cumulative gains charts, not relevant/redundant attributes, and features selection. The following classification models are described; decision trees, logistic regression, support vector machines, multi layer perceptron, naïve Bayes, tree augmented naïve Bayes and Bayesian classifiers. The course is self-contained as much as possible, and it does not require any programming skills. Indeed, the KNIME open source software platform, which exploits the concept of graphical workflow, is used to mine datasets consisting of different data types.

Descrizione in inglese:

Learn how to formulate and solve classification problems for use in Data Mining and Business Intelligence applications such as; fraud detection, customer churning, network intrusion detection, ... You will learn how to develop, validate and apply a data mining workflow to solve binary and non-binary classification problems. The course is self-contained, and it does not require any programming skills. Hands-on lectures are based on the KNIME open source software platform.


Stato
Tutoraggio Soft
Durata
4 settimane
Impegno
10 ore/settimana
Categoria
Informatica, Gestione e Analisi Dati
Lingua
Inglese
Tipo
Online
Livello
Base

Partecipazione e Attestati

Quota di iscrizione
GRATUITO!

Descrizione del Corso

Descrizione nella lingua del Corso:

This course introduces basic concepts and methods of Data Mining with specific reference to Clustering and Association Rules. We present concept and purposes of cluster analysis, together with its’ main components. Partitioning, hierarchical, density based, and graph based clustering methods are described. Particular attention is devoted to; cluster validity measures and clustering validation. The last part of the course introduces association rule discovery. The concepts of association rule, frequent itemset, support and confidence are given. Furthermore, we give a brief description of the Apriori algorithm for frequent itemset generation, and introduce the concepts of maximal and closed frequent itemset. Finally, different criteria, for evaluating the quality of association patterns, are introduced.

Descrizione in inglese:

Learn how to formulate and solve Clustering problems and Association Rule extraction problems for use in Data Mining and Business Intelligence applications. Clustering and Association Rule extraction are potentially interesting to solve problems as; store layout, customer profiling, targeted marketing, market basket analysis, ... You will learn how to develop, validate and apply Data Mining workflows to solve clustering and association rule extraction problems. The course is self-contained and hands-on lectures are based on the KNIME open source software platform.


Stato
Tutoraggio Soft
Durata
4 settimane
Impegno
9 ore/settimana
Categoria
Informatica, Gestione e Analisi Dati
Lingua
Inglese
Tipo
Online
Livello
Base

Partecipazione e Attestati

Quota di iscrizione
GRATUITO!

Descrizione del Corso

Descrizione nella lingua del Corso:

This course introduces basic concepts and methods of Text Mining with specific reference to natural language text preprocessing, text categorization, text clustering, and information extraction. We present basic natural language text preprocessing techniques such as; tokenization, filtering, stemming, disambiguation, and sentence boundary determination. We describe how to formulate and solve text categorization problems using models and methods from supervised classification. We show how to exploit text clustering for auto-organizing natural language text. We introduce state of the art document organization models, namely topic models and address the problem of selecting the “optimal number of topics” (whatever it means) for a given natural language text corpus. Finally, we introduce different information extraction tasks, such as; named entity recognition, noun phrase coreference resolution. semantic role recognition, entity relation Recognition, timex and time line recognition. Furthermore, we describe different instances of an Information Extraction system.

Descrizione in inglese:

Natural language text is everywhere, social networks, business, finance, medicine and biology are just few of the many sources of natural language text. However, computers are not fit to process natural language text. Indeed, Data Mining methods and algorithms, which operate on structured data, can not be directly applied to unstructured data for knowledge extraction. The Text Mining course, the last one of the Introduction to Data Mining pathway, introduces methods and tools for knowledge extraction from natural language text. The course assumes you are familiar with methods and models presented in the previous two courses, namely Data Mining: Classification and Data Mining: Clustering and Association. The course shows that Text Mining allows to formulate and solve problems in Business Intelligence, Finance, Recommendation, Medicine, Biomedicine, Social Networks, and Intelligence Gathering to mention just a few. In particular, the course introduces methods, models and algorithms for; natural language text preprocessing, text categorization, text clustering, topic modeling and information extraction.


Stato
Tutoraggio Soft
Durata
1 settimane
Impegno
1 ore/settimana
Categoria
Informatica, Gestione e Analisi Dati
Lingua
Inglese
Tipo
Online
Livello
Base

Partecipazione e Attestati

Quota di iscrizione
GRATUITO!

Descrizione del Corso

Descrizione in inglese:

The Introduction to Data Mining pathway teaches you how to use the data mining methodology to analyze both structured, semi-structured and un-structured data. The pathway consists of the following courses; Classification, Clustering and Association, and Text Mining. You will learn how to develop data mining and text mining workflows using the KNIME open source software platform. You are not required to code any programs while KNIME allows you to use open source programming languages and powerful commercial software environments; R, Weka, Matlab, Python, Java, ... and to access data from powerful platforms such as Twitter and Google.

Course sequence


You should attend the three courses of the Pathway in the following order: Classification, Clustering and Association, and Text Mining. However, if you know how to use the KNIME open source platform, have basic knowledge of the R programming language, then, no special order applies.

Evaluation and Certificates


Each course of the Pathway issues an Attendance Certificate and a Badge whether the following conditions are fulfilled: all practice sessions associated with each lecture are accomplished; the KNIME workflow associated with each practice session is uploaded to the course platform.

Requisiti


Basic knowledge of probability, statistics, and mathematics.

FABIO STELLA

Department of Informatics, Systems and Communication