This work package aims for the consolidation of the technical workflow of DeepGreen. More specific, the data flows between publishers and DeepGreen on the one side, and the data flows between DeepGreen and repositories on the other side will be both scrutinized. To begin with, this task will be approached in two sub-packages.
Sub-package 1: Core DeepGreen Service
Data flow between publishers and DeepGreen. A survey among the publishing partners of the first funding period (S. Karger AG, Sage Publications, Walter de Gruyter, Royal Society of Chemistry, BMJ und Oxford University Press) about the practicality of DeepGreen was conducted. As an interesting provisional result could be stated that, to the publishers, functions reporting back the correct application of their licenses are particularly important. Furthermore, publishers ask for a simpler, more concise way to deliver publications.
In order to improve the technical support for publishers a feasibility study will be done: Starting with openly available databases such as CrossRef, DataCite, providing metadata of publications, DeepGreen could ask publishers to automatically provide full texts if a match with respect to license agreements is encountered. As a consequence, publishers would need to change their steady push work and data flows to pull data flows (acting upon requests only).
To complete the aforementioned survey, additional publishers will be included into the survey. This sub-package will take the opportunity to evaluate the possible incorporation of other license models (such as so-called FID licenses) to the overall processing workflow of DeepGreen.
Workflow between DeepGreen and repositories. The DeepGreen workflow needs to be adapted to disciplinary repositories and current research information systems (CRIS).
It will be considered to deliver information about embargo dates to repositories as well. It is thought that this information could be retrieved from the electronical journal database (EZB) via REST API. A letter of intent by the university library Regensburg supports this cooperation.
Feedback regarding the interface, technical workflows and reporting functionalities, provided by publishers and repositories, are well documented and will thoroughly be taken into considerations of all the on-going developments of the DeepGreen service.
Sub-package 2: Partnerships with Repositories
During the first funding period DeepGreen successfully transferred publications from publishers to repositories. By extending this workflow, this sub-package aims to further automate the integration of publications into repositories. This will include the identification of duplicate and/or different versions of a publication as well as transforming license texts into machine-readable URLs. Another important issue will be author identification: publishers often deliver just first names and surnames of authors. The integration of ORCID information would enable reliable author identification to enhance the metadata transferred to repositories. As a result, this will improve the quality of DeepGreen’s distribution of openly accessible publications.
For the transmission of embargo information into repositories, a new workflow has to be designed and implemented. It is thought that this task can be accomplished either by an extension to the well-known SWORD protocol or by attaching the embargo information directly to the metadata of a publication.
With the perspective of including disciplinary repositories and CRIS, a consistent development of all the workflows under consideration is essential. To this end, as an effective tool, use cases are utilised in cooperation with the pilot repositories of DeepGreen.
The result is a well-defined and documented workflow for the communication between publishers and repositories via DeepGreen, also discussing the integration of a gold open access publisher. A successful publication of this documentation will be the ultimate milestone of this package.
Finally, the prototypical DeepGreen software will thus be pushed to the level of a continuously running, productive operating service.