Corpus

Allikas: Minority Translate
Redaktsioon seisuga 10. jaanuar 2016, kell 19:08 kasutajalt Ptinits (arutelu | kaastöö)

(erin) ←Vanem redaktsioon | Viimane redaktsiooni (erin) | Uuem redaktsioon→ (erin)
Main page | Technical overview | Languages | Manual | Gallery | Download | Translation corpus | Usage statistics | Contact
Language: Eesti | English | Русский

One of the aims of the software is to allow electronic resources to be gathered for small languages for further linguistic investigations.

For this reason, if you allow for it, Minority Translate stores some basic information about the edits that may not be preserved or may be difficult to access in a local Wikipedia on its servers, and makes it publically available for all investigators.

This information includes the time and id of the edit, the source and target languages used, and whether the text was said to be translated exactly or adapted loosely based on the sources. It also includes basic usage data, such as the filters which alter how much of the text was available and the operating system used.

The corpus files will also include a simple script that allows the materials stored on the server and the materials stored on the wikipedia to be compiled into a basic linguistic corpus, which can utilize the translations as a monolingual text corpus or as a complex of parallel corpora.

The raw data collected is currently available as a JSON file. Tools to process it will be included in the future.