The Institute’s aim is to open its data resources by digitizing its linguistic corpora and making them available online to a broad audience.
Licencing
The Institute’s current and future electronic language corpora that are public and without legislative or contractual limitations on their use will be opened up as public data resources, under a Creative Commons licence and in machine-readable format. In accordance with recommendation JHA 189 based on the Finnish Act on the Openness of Government Activities, the primary licence is Creative Commons Attribution 4.0. Previously, the Institute has used GNU and EUPL licences. Current licences also include CLARIN.
In addition to the open data resources, the Institute has scientific corpora that are subject to licence, usually in the interests of protecting personal information.
Metadata from the Institute’s archives are available via following services
Electronic material
This is a list of the Institute’s corpora and material available online free of charge. Some corpora have their own interface, while other can be accessed via common platforms.