The project has been conducted by a consortium comprising:
- The Department of Scandinavian Languages (University of Helsinki)
- The Department of General Linguistics (University of Helsinki)
- The Finnish Research Centre for Domestic Languages
The Department of Scandinavian Languages (Nordica) is the Lead
Participant of the consortium.
The project started officially at Nordica in August 1991. The kernel
corpus of about 2,5 running words was established by the end of 1995.
Future plans
- The kernel corpus of 2,5 million words may be complemented with a
monitor corpus of no finite size.
- There is good possibility to extend the section on spoken language.
Today, there is a material of about 80.000 running word consisting of
transcripts of speech. The material was originally collected and
transcribed in the project Swedish Coversations in Helsinki (SAM).
- Also a historic reference corpus (1600s, 1700s, 1800s) is beeing
planned. There is a material of about 200.000 words of 18th century
Finland Swedish but the processing of these texts is yet at a preliminary
stage.