Text Mining

Course Number: DIDA 310
Semester: Summer 2025
Level: Undergraduate
A graduate-level algorithms course focusing on advanced algorithmic techniques, complexity analysis, and specialized algorithms for specific problem domains. Students will implement algorithms and analyze their performance through practical assignments.

Overview

This course is an introduction of the methods and theory of quantitative analysis of documents and natural language processing. Students will learn to use modern tools and Python packages to analyze texts as individual words, sentences, and entire documents. Topics may include word counting, topic modelling, machine translation, regular expressions, and sentiment analysis. Prior programming experience is useful. This course is only open to students with sophomore standing or higher.

Course Content

Algorithmic Paradigms

  • Dynamic Programming Advanced Techniques
  • Amortized Analysis
  • Randomized Algorithms
  • Approximation Algorithms
  • Online Algorithms

Complexity Theory

  • NP-Completeness (review and extensions)
  • PSPACE and EXPTIME
  • Parameterized Complexity
  • Quantum Computing Complexity

Domain-Specific Algorithms

  • Network Flow Algorithms
  • Computational Geometry
  • String Algorithms
  • Graph Algorithms
  • Parallel Algorithms

Assignments

Students will complete:

  1. Theoretical Problem Sets (40%): Four problem sets focusing on algorithm design and analysis
  2. Implementation Projects (30%): Two programming projects implementing and analyzing advanced algorithms
  3. Research Paper (20%): An in-depth exploration of a specialized algorithm or algorithmic technique
  4. Class Participation (10%): Including algorithm presentations and discussions

Class Structure

Each week consists of:

  • Two 90-minute lectures (Tuesday/Thursday)
  • One optional recitation session (Friday)
  • Weekly office hours

Prerequisites

  • Undergraduate Algorithms (equivalent to CS 330)
  • Discrete Mathematics
  • Strong programming skills (preferably in C++, Java, or Python)
  • Mathematical maturity

Course Materials

Required Textbooks

  • “Algorithm Design” by Kleinberg and Tardos
  • “The Algorithm Design Manual” by Skiena

Supplementary Resources

  • “Introduction to Algorithms” by Cormen, Leiserson, Rivest, and Stein
  • “Randomized Algorithms” by Motwani and Raghavan
  • “Computational Geometry: Algorithms and Applications” by de Berg et al.

Weekly Schedule

WeekTopicReading
1Review of Algorithm AnalysisKleinberg Ch. 1-2
2Advanced Dynamic ProgrammingKleinberg Ch. 6
3Amortized AnalysisSkiena Ch. 10
4Randomized Algorithms IMotwani Ch. 1-2
5Randomized Algorithms IIMotwani Ch. 3-4
6NP-Completeness ReviewKleinberg Ch. 8
7Approximation AlgorithmsKleinberg Ch. 11
8Network Flow Advanced TopicsKleinberg Ch. 7
9Computational Geometry Foundationsde Berg Ch. 1-2
10String AlgorithmsSkiena Ch. 18
11Advanced Graph AlgorithmsSkiena Ch. 15-16
12Parallel AlgorithmsSelected Papers
13Quantum AlgorithmsSelected Papers
14Special Topics & Research Presentations-

Research Project Guidelines

The research project requires students to:

  1. Select an advanced algorithm not covered in depth in the course
  2. Research its theoretical foundations
  3. Implement the algorithm (when applicable)
  4. Analyze its performance empirically
  5. Write a technical paper (8-10 pages) and present findings

Example topics include:

  • Spectral graph algorithms
  • Algorithmic game theory
  • Machine learning algorithms
  • Cryptographic algorithms
  • Quantum computing algorithms