Thirteenth International Conference on Grey Literature Library of Congress, Washington D.C., USA December 5-6, 2011
Conference Program (PDF)
Session Four - Data Frontiers
Federal Information System on GL in Russia: A new stage of development in digital and network environment
Aleksandr V. Starovoitov, Aleksandr M. Bastrykin, Anton I. Borzykh, and Leonid P. Pavlov, CITIS, Russia The Russian Federation has inherited the federal-level information system on grey literature from the Soviet Union. The system covers the most informative kinds of grey literature -scientific research and development reports and post-graduate thesesas the sources of scientific and technical information being centrally collected at the Centre of Information Technologies and Systems of Executive State Authorities (abbreviated in Russian as CITIS) in accordance with the Federal Law “On the obligatory copy of documents”. The law obliges all the organizations – the collective authors of reports and persons – the individual authors of dissertations to give a free full-text copy of the documents to CITIS. In turn, the Centre is obliged not only to complete and permanently store the collection but also to disseminate the information on its content. In the course of the past decades the system experienced several modifications in order to get adapted to the changing organizational and technological reality. In its present state the federal system combines the following three functionally separate systems run by CITIS: the traditional system for collecting, processing, storing and providing access to R&D reports and theses called the computerized information system on science and technology (abbreviated in Russian as ASINIT) that has recently been improved to store the full-text reports and dissertations in a digital form and provide full-text search and retrieval;the system for self-funded research projects registration and monitoring that was put into operation in mid-2000 to reflect a growing trend in fundingR&D projects from research organizations’ own financial resources; the federal register for the results ofscientific and technical activities also created in mid-2000 with the idea of monitoring the life-cycle of patentogenic findings documented in scientific reports. All the three systems are operative and fulfill their functions however rapidly changing digital and network technologies create new environment to increase the systems’ efficiency and improve their services. The paper focuses on a new project in the process of development at CITIS under the auspices of the newly-started State programme “Information society (2011 – 2020)”. The project is aimed at the creation of the Integral state information system on scientific research and development that is supposed to unite the three systems using unified forms of input documents so that users were to fill in the similar information only once and in interactive network conditions. The integral system will use the instruments of full-text digital documents analysis and web-technologies so that to improve data-mining and to avoid plagiarism.
Enhancing diffusion of scientific contents: Open data in Repositories Daniela Luzi, Rosa Di Cesare, and Marta Ricci, IRPPS-National Research Council; Roberta Ruggieri, Senato della Repubblica, Italy The free availability of data gathered during research activities is becoming one of the new challenges of the Open Access Movement. New scientific instruments and technologies used in highly collaborative fields such as molecular biology, hearth and space sciences, make it possible to collect a great amount of data under different formats. Moreover, data are often associated with tools that can aggregate them as well as with direct references to the publications – conventional or non-conventional – that report the results of their analysis. Benefits of the availability of these data are evident, they pertain among others the possibility to assess research results, reproduce and re-use data to possibly draw new insight for future research. There are an increasing number of repositories that also include datasets among the scientific results made openly available. Given the variety of scientific context in which datasets are produced as well as the diversity of dataset types, the objective of our study is to analyse this trend highlighting the main issues related to their provision, availability and preservation. In this context dataset definition, their function in providing value-added information for research, ownership of data as well as the development of a suitable infrastructure (within repositories and/or in other types of information systems) in which datasets can be best diffused and re-used are part of an ever-increasing list of issues put on the current research agenda. Our source of analysis is the directory of Open Access Repositories (OpenDOAR) that lists international repositories under different criteria and allows different types of analyses. For the purpose of our study, we selected in the listed category “Content type” the option “datasets” that provides information on and links to repositories that contain datasets. Our survey is focused first on the identification of types of repository and institutions providing datasets, and then on the analysis of each of the 78 repositories currently reported in OpenDOAR. In particular this analysis will concentrate on • Dataset content (row data, spatial data, formulae, etc.) • Data layout (spreadsheets, tables, images, etc.) • Relation with other digital objects in the repository • Metadata used to describe datasets • Access modality. Moreover, the possible inclusion of datasets (and/or other types of digital scientific contents) in the realm of Grey may open up new perspectives and challenges in tracking the knowledge creation process as well as in the collection and preservation of value-added information considering both new available technologies and evolving needs of the scholarly communication.
Research product repositories:Strategies for data and metadata quality control Luisa De Biagi, Roberto Puccinelli, Massimiliano Saccone, and Luciana Trufelli, National Research Council, Italy In recent years a significant effort has been spent by R&D institutions and scientific information stakeholders in general to enhance and improve the quality of Open Access initiatives and the performance of the associated services. Nevertheless much work is still needed to tackle pending data quality issues. This paper proposes some functional and organisational solutions, based on the cooperation of all the main actors of the R&D system, which in our view should help improving quality control of data and descriptive metadata stored in research product Open Access (OA) repositories. We think that this strategy could favor a substantial innovation of the document management services offered to the scientific community and to policy makers, ensuring the interoperability between institutional repositories and Current Research Information Systems (CRIS). Particular emphasis is given to the problem of data and metadata indexing and organization with respect to unconventional research products, which represent an important asset in the field of scientific communication.
Linking full-text GL to underlying research and post-publication data:An Enhanced Publications Project Dominic Farace and Jerry Frantzen, GreyNet, Netherlands; Christiane Stock, INIST-CNRS, France; Laurents Sesink, Data Archiving and Networked Services, Netherlands and Debbie Rabina, Pratt Institute, United States In a way, this project seeks to circumvent the data vs. documents camp in the grey literature community by way of a middle ground provided through enhanced publications. Enhanced publications allow for a fuller understanding of the process in which data and information are used and applied in the generation of knowledge. The enhanced publication of grey literature precludes the idea of a random selection of data and information, and instead focuses on the human intervention in data-rich environments. The definition of an enhanced publication is borrowed from the DRIVER-II project, “a publication that is enhanced with three categories of information: 1) research data, 2) extra materials, and 3) post-publication data”. Enhanced publications combine textual resources i.e. documents intended to be read by human beings, containing an interpretation or analysis of primary data. Enhanced publications inherently contribute to the review process of grey literature as well as the replication of research and improved visibility of research results in the scholarly communication chain. The goal of this project is threefold: to enhance GreyNet’s existing collection of conference preprints by adding corresponding links to research data, to include commentaries i.e. post-publication data on GreyNet’s existing conference preprints in metadata records, and to establish a workflow for future GreyNet enhanced publications based on the results of this project, where permanent access to full-texts and their data enriched components are made available via persistent identifiers accessible in trusted repositories. Each of the four partnering organizations is brought together based on their expertise and tasks they will execute during the course of the project.GreyNet will work together with INIST-CNRS to devise a questionnaire and carry-out a survey among its author base in the acquisition of research data linked to conference preprints. GreyNet will facilitate data entry in the DANS Easy Repository with link backs to corresponding metadata records in the OpenGrey Repository. And, GreyNet will cooperate with Pratt Institute to establish basic criteria upon which commentaries by LIS students will be compiled and added to existing metadata records. Not all grey literature is based on research data, and this holds for GreyNet’s collection of conference preprints. While it is anticipated that GreyNet’s contributing authors will be inclined to make their research data available, some data from previous years will not be retrievable. Since student commentaries are related to academic credit, this portion of the project will harvest optimal results. And, the combined results of the project will contribute to a future workflow not only for GreyNet’s enhanced publications but also for other grey literature communities.
TextRelease GL13 Program and Conference Bureau Javastraat 194-HS 1095 CP Amsterdam The Netherlands