How to Cite Pauhu
Proper citation guidelines for academic publications using Pauhu data and services.
Citation Requirement
All Pauhu Research License holders must cite Pauhu in publications. This is part of the CLARIN ACA+BY license terms.
Quick Citation
For most publications, use:
Pauhu AI Ltd. (2026). Pauhu Enriched Parallel Corpus [Data set]. https://pauhu.com
Citation by Resource Type
Parallel Corpora (General)
BibTeX:
@misc{pauhu_corpus_2026,
author = {{Pauhu AI Ltd}},
title = {Pauhu Enriched Parallel Corpus},
year = {2026},
publisher = {Pauhu AI Ltd},
address = {Helsinki, Finland},
url = {https://pauhu.com},
note = {Licensed under CLARIN ACA+BY+NC+NORED}
}
APA 7th Edition:
Pauhu AI Ltd. (2026). Pauhu Enriched Parallel Corpus [Data set].
https://pauhu.com
Domain-Specific Corpora
When citing a specific EuroVoc domain:
BibTeX:
@misc{pauhu_law_corpus_2026,
author = {{Pauhu AI Ltd}},
title = {Pauhu Law Enriched Parallel Corpus},
year = {2026},
publisher = {Pauhu AI Ltd},
address = {Helsinki, Finland},
url = {https://pauhu.com/lds/catalogue/},
note = {EuroVoc Domain 12 (Law). Licensed under
CLARIN ACA+BY+NC+NORED}
}
Morphology Data
BibTeX:
@misc{pauhu_morphology_2026,
author = {{Pauhu AI Ltd}},
title = {Pauhu Morphology Data},
year = {2026},
publisher = {Pauhu AI Ltd},
url = {https://pauhu.com/api/morphology/},
note = {Based on UniMorph 4.0. 163 languages,
15.9M word forms. CC BY-SA 4.0}
}
Important: Also cite the original UniMorph project:
@inproceedings{batsuren-etal-2022-unimorph,
title = {{UniMorph} 4.0: {U}niversal {M}orphology},
author = {Batsuren, Khuyagbaatar and others},
booktitle = {LREC 2022},
year = {2022},
url = {https://unimorph.github.io/}
}
Translation API
BibTeX:
@misc{pauhu_translation_api_2026,
author = {{Pauhu AI Ltd}},
title = {Pauhu Translation API},
year = {2026},
publisher = {Pauhu AI Ltd},
url = {https://pauhu.com/api/translation/},
note = {Neural machine translation. 24 EU languages.
Helsinki-NLP models with domain fine-tuning}
}
In-Text Citations
Introducing the Data
We used the Pauhu Enriched Parallel Corpus (Pauhu AI Ltd, 2026), which provides EU legal texts with five layers of linguistic annotation...
Methods Section
Data: We obtained English-Finnish parallel data from the Pauhu Law corpus (Pauhu AI Ltd, 2026), specifically the EuroVoc Domain 12 subset containing legal texts...
License Attribution
When using data, include the license:
Data: Pauhu Enriched Parallel Corpus
Provider: Pauhu AI Ltd (Helsinki, Finland)
Original source: EUR-Lex (https://eur-lex.europa.eu)
License: CLARIN ACA+BY+NC+NORED
Accessed: [DATE]
DOI Assignment
For formal data citations, we can provide DOIs:
| Resource | DOI Status |
|---|---|
| Full corpus | Available on request |
| Domain subsets | Available on request |
| Specific versions | Available on request |
Request DOI: research@pauhu.ai
Questions?
Citation support: research@pauhu.ai
Response time: 24 hours