From Reports to Data: Harnessing LLMs for Fine-Grained Human Rights Data Collection

Abstract

The measurement of human rights remains a central challenge in the academic literature. While scholars have extensively debated the limitations of existing metrics and explored alternative strategies, access to fine-grained and reliable data remains limited, and this access is necessary to test theoretical claims and to develop early warning systems to prevent atrocities. Recent advances in large language models (LLMs) offer a novel opportunity to address this gap. This paper presents an innovative approach to human rights data collection by utilizing LLMs to extract daily-level information from reports produced by local non-governmental organizations. Focusing on Turkey as a case study, I compile a dataset of human rights violations spanning the years 2013 to 2023, resulting in approximately 30,000 unique observations. This paper contributes to the ongoing discourse on human rights measurement by introducing a scalable and replicable framework for generating high-resolution data.

Links

Citation

Evirgen, Yusuf (2026). From Reports to Data: Harnessing LLMs for Fine-Grained Human Rights Data Collection.
@article{evirgen2026, title = {From Reports to Data: Harnessing LLMs for Fine-Grained Human Rights Data Collection}, author = {Yusuf Evirgen}, journal = {Unpublished Manuscript}, year = {2026}, url = {https://yusufevirgen.com/research/working-papers/from-reports-to-data/} }