Text Mining
A graduate-level algorithms course focusing on advanced algorithmic techniques, complexity analysis, and specialized algorithms for specific problem domains. Students will implement algorithms and analyze their performance through practical assignments.
Overview
This course is an introduction of the methods and theory of quantitative analysis of documents and natural language processing. Students will learn to use modern tools and Python packages to analyze texts as individual words, sentences, and entire documents. Topics may include word counting, topic modelling, machine translation, regular expressions, and sentiment analysis. Prior programming experience is useful. This course is only open to students with sophomore standing or higher.
Course Content
Algorithmic Paradigms
- Dynamic Programming Advanced Techniques
- Amortized Analysis
- Randomized Algorithms
- Approximation Algorithms
- Online Algorithms
Complexity Theory
- NP-Completeness (review and extensions)
- PSPACE and EXPTIME
- Parameterized Complexity
- Quantum Computing Complexity
Domain-Specific Algorithms
- Network Flow Algorithms
- Computational Geometry
- String Algorithms
- Graph Algorithms
- Parallel Algorithms
Assignments
Students will complete:
- Theoretical Problem Sets (40%): Four problem sets focusing on algorithm design and analysis
- Implementation Projects (30%): Two programming projects implementing and analyzing advanced algorithms
- Research Paper (20%): An in-depth exploration of a specialized algorithm or algorithmic technique
- Class Participation (10%): Including algorithm presentations and discussions
Class Structure
Each week consists of:
- Two 90-minute lectures (Tuesday/Thursday)
- One optional recitation session (Friday)
- Weekly office hours
Prerequisites
- Undergraduate Algorithms (equivalent to CS 330)
- Discrete Mathematics
- Strong programming skills (preferably in C++, Java, or Python)
- Mathematical maturity
Course Materials
Required Textbooks
- “Algorithm Design” by Kleinberg and Tardos
- “The Algorithm Design Manual” by Skiena
Supplementary Resources
- “Introduction to Algorithms” by Cormen, Leiserson, Rivest, and Stein
- “Randomized Algorithms” by Motwani and Raghavan
- “Computational Geometry: Algorithms and Applications” by de Berg et al.
Weekly Schedule
Week | Topic | Reading |
---|---|---|
1 | Review of Algorithm Analysis | Kleinberg Ch. 1-2 |
2 | Advanced Dynamic Programming | Kleinberg Ch. 6 |
3 | Amortized Analysis | Skiena Ch. 10 |
4 | Randomized Algorithms I | Motwani Ch. 1-2 |
5 | Randomized Algorithms II | Motwani Ch. 3-4 |
6 | NP-Completeness Review | Kleinberg Ch. 8 |
7 | Approximation Algorithms | Kleinberg Ch. 11 |
8 | Network Flow Advanced Topics | Kleinberg Ch. 7 |
9 | Computational Geometry Foundations | de Berg Ch. 1-2 |
10 | String Algorithms | Skiena Ch. 18 |
11 | Advanced Graph Algorithms | Skiena Ch. 15-16 |
12 | Parallel Algorithms | Selected Papers |
13 | Quantum Algorithms | Selected Papers |
14 | Special Topics & Research Presentations | - |
Research Project Guidelines
The research project requires students to:
- Select an advanced algorithm not covered in depth in the course
- Research its theoretical foundations
- Implement the algorithm (when applicable)
- Analyze its performance empirically
- Write a technical paper (8-10 pages) and present findings
Example topics include:
- Spectral graph algorithms
- Algorithmic game theory
- Machine learning algorithms
- Cryptographic algorithms
- Quantum computing algorithms