How to Cite Pauhu

Proper citation guidelines for academic publications using Pauhu data and services.


Citation Requirement

All Pauhu Research License holders must cite Pauhu in publications. This is part of the CLARIN ACA+BY license terms.


Quick Citation

For most publications, use:

Pauhu AI Ltd. (2026). Pauhu Enriched Parallel Corpus [Data set]. https://pauhu.com


Citation by Resource Type

Parallel Corpora (General)

BibTeX:

@misc{pauhu_corpus_2026,
  author       = {{Pauhu AI Ltd}},
  title        = {Pauhu Enriched Parallel Corpus},
  year         = {2026},
  publisher    = {Pauhu AI Ltd},
  address      = {Helsinki, Finland},
  url          = {https://pauhu.com},
  note         = {Licensed under CLARIN ACA+BY+NC+NORED}
}

APA 7th Edition:

Pauhu AI Ltd. (2026). Pauhu Enriched Parallel Corpus [Data set].
    https://pauhu.com

Domain-Specific Corpora

When citing a specific EuroVoc domain:

BibTeX:

@misc{pauhu_law_corpus_2026,
  author       = {{Pauhu AI Ltd}},
  title        = {Pauhu Law Enriched Parallel Corpus},
  year         = {2026},
  publisher    = {Pauhu AI Ltd},
  address      = {Helsinki, Finland},
  url          = {https://pauhu.com/lds/catalogue/},
  note         = {EuroVoc Domain 12 (Law). Licensed under
                  CLARIN ACA+BY+NC+NORED}
}

Morphology Data

BibTeX:

@misc{pauhu_morphology_2026,
  author       = {{Pauhu AI Ltd}},
  title        = {Pauhu Morphology Data},
  year         = {2026},
  publisher    = {Pauhu AI Ltd},
  url          = {https://pauhu.com/api/morphology/},
  note         = {Based on UniMorph 4.0. 163 languages,
                  15.9M word forms. CC BY-SA 4.0}
}

Important: Also cite the original UniMorph project:

@inproceedings{batsuren-etal-2022-unimorph,
  title        = {{UniMorph} 4.0: {U}niversal {M}orphology},
  author       = {Batsuren, Khuyagbaatar and others},
  booktitle    = {LREC 2022},
  year         = {2022},
  url          = {https://unimorph.github.io/}
}

Translation API

BibTeX:

@misc{pauhu_translation_api_2026,
  author       = {{Pauhu AI Ltd}},
  title        = {Pauhu Translation API},
  year         = {2026},
  publisher    = {Pauhu AI Ltd},
  url          = {https://pauhu.com/api/translation/},
  note         = {Neural machine translation. 24 EU languages.
                  Helsinki-NLP models with domain fine-tuning}
}

In-Text Citations

Introducing the Data

We used the Pauhu Enriched Parallel Corpus (Pauhu AI Ltd, 2026), which provides EU legal texts with five layers of linguistic annotation...

Methods Section

Data: We obtained English-Finnish parallel data from the Pauhu Law corpus (Pauhu AI Ltd, 2026), specifically the EuroVoc Domain 12 subset containing legal texts...


License Attribution

When using data, include the license:

Data: Pauhu Enriched Parallel Corpus
Provider: Pauhu AI Ltd (Helsinki, Finland)
Original source: EUR-Lex (https://eur-lex.europa.eu)
License: CLARIN ACA+BY+NC+NORED
Accessed: [DATE]

DOI Assignment

For formal data citations, we can provide DOIs:

Resource DOI Status
Full corpus Available on request
Domain subsets Available on request
Specific versions Available on request

Request DOI: research@pauhu.ai


Questions?

Citation support: research@pauhu.ai
Response time: 24 hours


Related Pages