Social Media and Text Analytics 2401-CS-SMaTA-s2
Complete description of the subject 1. Introduction – Introduction of social media and natural language processing research. Overview of the course. (1 week)
2. Introduction to different sources of text (e. g.) Twitter with introduction of process of Twitter Sentiment Analysis. (1 week)
3. Natural Language processing. Briefly discussion about Tokenization, Normalization, POS Tagging of text, text cleaning Entity Recognition. Document representation. Discussion about how represent the unstructured text – BOW, TF-IDF (2 weeks)
4. Text categorization. Including discussion about several basic supervised text categorization algorithms: Naïve Bayes, kNN, Logistic Regression. (If time allows also SVM) (2 weeks)
5. Text clustering. If refers to the task of identifying the clustering structure of a corpus of text documents. (2 weeks)
6. Twitter Sentiment Analysis (2 week)
7. Topic modeling. Including discussion about the process of uncover the hidden thematic structure in set of documents. The general idea of topic modeling will be introduced with two basic algorithms: LSI and LDA. (2 weeks)
8. The role of visualization of information in text mining. With introduction to Python tools to help visualize a large collection of text documents. (1 week)
9. Students project presentations. (2 weeks)
Całkowity nakład pracy studenta
Efekty uczenia się - wiedza
Efekty uczenia się - umiejętności
Efekty uczenia się - kompetencje społeczne
Metody dydaktyczne
Wymagania wstępne
Koordynatorzy przedmiotu
Kryteria oceniania
Assessment methods:
np.
- written examination W1, W2, W3, W4, W5
- student project U1, U2, U3, K1, K2, K3, W5
- activity K1, K2, K3,
fail- under 50%
satisfactory- 50 – 60% (including 60%)
satisfactory plus- 60% - 70% (including 70%)
good - 60% -80% (including 80%)
good plus- 80% - 90% (including 90%)
very good- Above 90 %
Literatura
1. Ian H. Witten, Eibe Frank, “Data Mining: Practical Machine Learning Tools and Techniques”
2. Steven Bird, Ewan Klein, and Edward Loper “Natural Language Processing with Python”
3. Manning, Chris and Hinrich Schütze. Foundations of Statistical Natural Language Processing.
4. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schuetze. Introduction to Information Retrieval
Więcej informacji
Dodatkowe informacje (np. o kalendarzu rejestracji, prowadzących zajęcia, lokalizacji i terminach zajęć) mogą być dostępne w serwisie USOSweb: