(in Polish) Przetwarzanie języka naturalnego 1000-I2PJN

Lecture
The lecture program includes an introduction to text data mining, a presentation of the main sources of such data, and a discussion of the necessity of text preprocessing. The main focus of the lecture is on presenting modern methods for natural language analysis, including small and large language models, and comparing them with traditional NLP methods, along with their applications in text analysis.

Laboratory
The laboratory classes focus on presenting the capabilities of the Python programming language in the area of text data analysis. The topics and algorithms discussed during the lectures will be applied to the analysis of real-world textual data.

Total student workload

Hours conducted with the participation of instructors: a) lecture – 30 hours b) laboratory – 30 hours c) ongoing preparation for classes, including solving assignments given by instructors, reviewing feedback on completed tasks, and consultations with instructors – 30 hours Time devoted to the student’s individual work required to pass the course: a) studying the literature – 15 hours b) preparation of course projects – 30 hours Time required to prepare for participation in the assessment process: a) exam preparation – 15 hours TOTAL: 150 hours (6 ECTS credits)

Learning outcomes - knowledge

W1. Identifies differences between structured and unstructured data, and understands the specific problems and challenges related to processing and analyzing unstructured data (K_W07). W2. Has knowledge of statistical methods useful in the analysis of unstructured data and is familiar with example applications of these methods that lead to discovering relationships within such data (K_W01, K_W07). W3. Is familiar with modern language models and their applications (K_W01, K_W04, K_W02, K_W03). W4. Knows the most important programming tools and libraries intended for processing and analyzing unstructured data (K_W03). W5. Is familiar with issues related to traditional text processing methods (e.g. classification) (K_W01, K_W04).

Learning outcomes - skills

U1. Is able to collect textual data from publicly available resources (K_U02, K_U03). U2. Is able to extract key features from text documents and transform them into a vector representation suitable for analysis (K_U03). U3. Is able to perform classification or clustering of sets of text documents using appropriate algorithms and tools (K_U01, K_U02, K_U03).

Learning outcomes - social competencies

K1. Is able to formulate a text data mining problem in a way that is understandable both to collaborators working in this area and to expert analysts (K_K01, K_K06). K2. Is aware of the ethical and legal constraints related to collecting, storing, and analyzing textual data (K_K03). K3. Understands the need for continuous expansion and updating of knowledge in the field of text data analysis (K_K01, K_K02).

Course coordinators

Łukasz Górski

Teaching methods

Informational lecture (traditional), conversational lecture, case study

Type of course

compulsory course

Prerequisites

Knowledge of the basics of linear algebra, probability theory, and descriptive statistics. Knowledge of at least one programming language (Python recommended). Knowledge of the fundamentals of machine learning.

Assessment criteria

Exam – W1, W2, W3, K1, K3
Programming project – W1, W2, W4, W5, U1, U2, U3, K1, K2
Class participation – K1

Bibliography

Basic literature:
- H. Lane, C. Howard, H. M. Hapke - Przetwarzanie języka naturalnego w akcji, PWN, 2021,
- S. M. Weiss, N. Indurkhya, T. Zhang - Fundamentals of Predictive Text Mining, Second Edition, Springer, 2015,
- Ch. Zong, R. Xia, J. Zhang – Text Data Mining, Springer 2021,
- L. Gazir, M. Ghaffari, Mastering NLP – from Foundations to LLMs, packt 2024,
- J. Alammar & M. Grootendorst, Hands-On Large Language Models. Language Understanding and Generation, O’Reilly 2024
Supplemental literature:
- S. Vajjala, B. Majumder, A. Gupta, H. Surana Przetwarzanie języka naturalnego w praktyce. Przewodnik po budowie rzeczywistych systemów NLP, Helion 2023.
- B. Liu - Sentiment Analysis, Cambridge University Press, 2015
- S. Raschka, Stwórz własne AI. Jak od podstaw zbudować duży model językowy, wyd. Helion 2025

Additional information

Additional information (registration calendar, class conductors, localization and schedules of classes), might be available in the USOSweb system:

Description of 1000-I2PJN in USOSweb