Skip to content

Metadata Schema

Metadata schema

The test data which was sent by the pilot publisher was analyzed and in accordance with all participants (publishers, libraries) and the DINI-AG Electronic publishing a metadata schema has been developed. Reusability and conformance with existing standards, interfaces and delivering systems with everyone evolved is the aspired goal. If this can not be achieved with a metadata schema, additional methods for data conversion, standardization and enrichment will be discussed.

It is important to consider different perspectives and combine them pragmatical:

1. Publishers’ perspective. According to the DNB (Section: Acquiring Net publications) most publishers deliver their publications with a ONIX metadata format. The national library also supports MARCXML and XMetaDissPlus.14. In general, publishers support formats based on Dublin Core (DC).

2. Libraries’ perspective. In an ideal case the metadata should be able to be integrated in a library catalog or a discovery system. An important concern is the sufficient depth of the metadata, meaning they should represent the complexity and relations of a publications (e.g. relationships between articles, authors, title of the journal, etc.). With regards to the future role of libraries as data recipients the association needs to be taken into consideration. For this the additional affiliations (associated University, scientific instituion, institute, etc) of the authors needs to be apart of the Metadata, so that this selection can be achieved.

3. Repository software perspective. According to the 2014 ‘Census of Open Access Repositories’ Germany, Austria and Switzerland primarily use OPUS (53,3%), followed by EPrints (15,1%) and DSpace (7,2%). Worldwide DSpace is the most prevalent repository software (1.161) compared to EPrints (380) and OPUS (70). OAI-PMH can be assumed to be the interface for interoperability, SWORD is also often supported (EPrints, DSpace, not in OPUS). The format is usually Dublin Core (DC) (or loosely based on it). The well-known repositories Europe PMC and PubMed also use DC. In accordance with DRIVER and OpenAIRE a controlled vocabulary for European repositories called info:eu-repo has been developed. This also needs to be taken into consideration. The participants are therefore in contact with the interest group “Controlled Vocabularies for Repository Assets” of the Confederation of Open Access Repositories (COAR).

The project DeepGreen organized a two day workshop in April 2016, at which academic experts, information infrastructure experts and publishers discussed requirements for the metadata schema and the workflow. The workshop was also used as a platform to discuss possibe future scenarios for the DeepGreen infrastructure. The goal was to adress gaps in the current protocols (workflow) and standards (metadata). Topics such as author and institutional identification methods were also part of the discussion.