Social Media and Text Analytics 2401-CS-SMaTA-s2

Complete description of the subject 1. Introduction – Introduction of social media and natural language processing research. Overview of the course. (1 week)
2. Introduction to different sources of text (e. g.) Twitter with introduction of process of Twitter Sentiment Analysis. (1 week)
3. Natural Language processing. Briefly discussion about Tokenization, Normalization, POS Tagging of text, text cleaning Entity Recognition. Document representation. Discussion about how represent the unstructured text – BOW, TF-IDF (2 weeks)
4. Text categorization. Including discussion about several basic supervised text categorization algorithms: Naïve Bayes, kNN, Logistic Regression. (If time allows also SVM) (2 weeks)
5. Text clustering. If refers to the task of identifying the clustering structure of a corpus of text documents. (2 weeks)
6. Twitter Sentiment Analysis (2 week)
7. Topic modeling. Including discussion about the process of uncover the hidden thematic structure in set of documents. The general idea of topic modeling will be introduced with two basic algorithms: LSI and LDA. (2 weeks)
8. The role of visualization of information in text mining. With introduction to Python tools to help visualize a large collection of text documents. (1 week)
9. Students project presentations. (2 weeks)

Całkowity nakład pracy studenta

Total student workload Contact hours with teacher: - participation in laboratory - 30 hrs Self-study hours: - preparation for lectures - 10 hrs - preparation for test/ examination- 30 hrs Altogether: 70 hrs

Efekty uczenia się - wiedza

W1: The student knows the basic methods used in NPL W2: The student knows the areas of application of text processing algorithms W3: The student knows the criteria for selecting text analysis methods W4: The student knows the text analysis algorithms (text categorization, clustering, topic modeling) W5: The student identifies social media as source of text data

Efekty uczenia się - umiejętności

U1: The student is able to use basic text processing tools U2: The student is able to write a simple program using NLP algorithms U3: The student is able to choose the appropriate algorithm to solve a given problem

Efekty uczenia się - kompetencje społeczne

K1: The student is able to communicate the effects of text processing algorithms K2: The student is able to argue and interpret the results of the text processing algorithm.

Metody dydaktyczne

1. Observation teaching methods: - display 2. Expository teaching methods - participatory lecture - problem-based lecture 3. Exploratory teaching methods - practical - brainstorming - laboratory - project work - presentation of a paper

Wymagania wstępne

1. Python – basic (including basic libraries and packages: numpy, pandas, matplotlib, scikit-learn) 2. basic knowledge of machine learning 3. basic mathematic knowledge

Koordynatorzy przedmiotu

Joanna Michalak

Kryteria oceniania

Assessment methods:
np.
- written examination W1, W2, W3, W4, W5
- student project U1, U2, U3, K1, K2, K3, W5
- activity K1, K2, K3,

fail- under 50%
satisfactory- 50 – 60% (including 60%)
satisfactory plus- 60% - 70% (including 70%)
good - 60% -80% (including 80%)
good plus- 80% - 90% (including 90%)
very good- Above 90 %

Literatura

1. Ian H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”
2. Steven Bird, Ewan Klein, and Edward Loper “Natural Language Processing with Python”
3. Manning, Chris and Hinrich Schütze. Foundations of Statistical Natural Language Processing.
4. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schuetze. Introduction to Information Retrieval

Więcej informacji

Dodatkowe informacje (np. o kalendarzu rejestracji, prowadzących zajęcia, lokalizacji i terminach zajęć) mogą być dostępne w serwisie USOSweb:

Strona przedmiotu 2401-CS-SMaTA-s2 w USOSweb