How we can share our research with the world (and why we should bother)
The open access idea:
All public scientific efforts should be freely available globally.
Today I gave a lunchtime seminar on the topic of Open Access and Open Data. This is something I’ve been getting increasingly interested in over the last few years, and two events coincided to persuade me to talk about it to the other members of the department. Firstly, I went to an excellent talk by Jelena Aleksic, covering much of the same ground, at a recent Cambridge Open Research group meeting. Secondly, I was handed the keys to my divisional seminar scheduling list in the Department of Pathology, and left to go wild. I decided to abuse my power, and give myself a slot to talk about something a bit out of the ordinary but that I think is very important. The following post is basically a summary of the talk I gave (the slides are available from SlideShare).
Why does open access matter?
Traditional publishing has followed a simple model: scientists write papers, and send them to a journal. Other scientists review the paper, the journal adds some formatting, then sells subscriptions so that other people can read the paper. This model has a problem: it restricts access to results to those who can afford to pay the (very hefty) subscription fees, which impacts on not just universities, but also on businesses, interested lay people, and the public at large. Anyone with an interest in governments producing evidence-based policies will see the problem with their representatives not being able to actually see the evidence.
Further, there is the issue of machine readability and text mining. There are more than 1 million papers being published every year in biosciences alone; that represents a huge torrent of information that, ideally, we would pass on to computers to identify genes, compounds, diseases and make links between fields that may otherwise be missed. Mostly we can’t do this – not for technical reasons, but because traditional publishers don’t allow it.
What are the open access models?
The two main models for open access I described are the gold and green models. Gold access is basically a rearrangement of the traditional model: instead of paying to read a paper, you pay to publish a paper. This means that anyone anywhere can read your research. The downside is that it costs a lot (often >£1000) to publish, and again this restricts scientific publishing to the wealthy. Some open access journals do scale their fees depending on where the submitter is based, but it remains an exclusive means of publishing.
A more appealing option is the green model. In this case, publishing is free and the responsibility for sharing the research falls on the submitter; they are given permission by the publisher to post their pre-print or post-print to a repository, or personal website (and in this case, a scientific networking site like ResearchGate or Academia.edu is considered a personal site). All they cannot share are the final formatted pdfs, and well-off institutions can maintain subscriptions to receive access to these better formatted versions.
While green access gets around many of the issues of gold access, it still has some problems of its own; it relies on author motivation, and some authors have a reluctance to use a pre-print server. Fields like physics and maths have a far greater usage of such pre-print archiving than biology (though this may change since the introduction of the bioRxiv pre-print server last year).
How can we determine what the options are for a journal, and how can know if our funding bodies will permit us to publish in a given journal? The SHERPA/RoMEO service is maintained by the University of Nottingham, and tracks the open access policies of many journals plus how they comply with different funding body requirements. It lists whether pre-prints, post-prints or publisher versions of a paper can be self-archived, and saves a lot of time from digging around the depths of journal websites.
What other platforms are there to enable data sharing?
Some data types have well-established repositories. Microarray data is widely stored in GEO or ArrayExpress and annotated such that anyone can re-analyse the data and check the authors’ conclusions. Other forms of data do not have these resources, but would be useful if made available. For example, if a phylogenetic tree is given as a figure in a paper, then the original tree file in Newick format or similar would make subsequent analysis easier. For this type of data sharing, generic repositories like Figshare and DataDryad are useful. They are not always free, but many partner with journals to provide storage space when a paper is accepted.
How to promote open access?
Levels of open access by year, redrawn from [1]. Data is pooled from figures 1 and 2
At the moment, the levels of open access publication are quite low; the figure shows some data
[1] showing that since the late 90s there has been little increase in open access publishing in the bioscience areas. Some fields like physics and maths are higher, but still not much over 30%. Many are indeed lower. Effecting change here requires both top-down and bottom-up approaches, and fortunately the main funding bodies in the UK agree.
The Research Councils UK (RCUK) include the BBSRC, MRC and EPSRC, and they provide much of the research funding to universities. They have mandated that, since April 2013, any published research they funded must be made open access no later than six months after publication. The HEFCE (Higher Education Funding Council for England) has gone a step further. They administer the Research Excellence Framework (REF), the assessment of universities that, in part, determines how much government funding the university will receive. Each university department puts forward their best candidate papers for consideration in the REF; the HEFCE has recommended in a recent consultation that post-2014, a paper must be deposited immediately in an institutional repository upon acceptance to be admissible.
Such top-down approaches do work. A comparison of individual institutions that mandate open access publication versus those that don’t found that on average, mandating open access led to three times greater open publishing amongst their staff (60% vs 20%).
Other approaches to publishing
Not all open access publishing falls cleanly into the green or gold categories; PLoS has a spectrum of criteria to consider on open access, from reader rights, to author rights, to the machine readability of the data in the papers. They also publish the journal PLoS One, which reviews only for technical correctness – not importance of the research. As a result, they now publish more than 20,000 papers per year. PeerJ is a journal that uses a one-off lifetime subscription for authors – after this, an author can publish in PeerJ as many times as they like for no further charge. The Faculty of 1000 is pushing even further for openness: they keep the peer review process open as well. A paper is published immediately upon submission, after a technical check by the editors. The review comments, and the subsequent revisions to the manuscript are open for all to see.
Conclusion
Open access is important for everyone, not just scientists. It has taken a long time to get even this meager level of OA publishing, and we need to keep pushing onwards. The top-down support for open access in the UK (and other countries) is helping the process immensely, as is the growing recognition of the problem amongst scientists. Inertia can take a lot of effort to overcome, but we are getting there.