Lost and Found: Discoverability and the politics of being discovered

“…in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.”^[i]

In many ways, we now live in the information rich world presciently described by Herbert Simon in 1971. Year on year, more journals are being launched and more articles are being written than at any other time in history. Estimates put the number of peer reviewed scholarly journals published in the English language alone at 28,100 and the number of articles written for these journals annually at 2.5 million.^[ii] The size of the Google Scholar search base, which includes journals, books and other grey literatures is estimated to include between 100-160 million items.^[iii] As well as these traditional research outputs, there is another layer of more ephemeral material; websites, blogs and academic social media. A sense of the size and scale of the informational environment in which these objects exist is given by the website, www.internetlivestats.com, which visually presents the inexorable growth in the amount of information uploaded to the internet.

Alongside this abundance of information, there are also ever greater numbers of searchers. Globally, the number of scholars continues to rise and the expanding body of research made available through open access publishing (9,508 journals and 2,512,837 articles, according to the Directory of Open Access Journals at time of writing), indicates that much research can now, at least in theory, be accessed by a truly global audience. The total amount of research to be found is therefore not only greater in terms of quantity, but also in terms of the different pathways taken that lead to the discovery of relevant material.

Taking this into account, evidence that suggests as much as one third of social science papers are never cited is troubling.^[iv] Citation is of course a weak indicator of attention, as it represents only the formal recognition of research by other academics. However, if one in three papers appears to have a minimal impact on their closest audiences, it suggests a depressingly large amount of research may have either little immediately recognizable value, or that there are inefficiencies in the ways research is being communicated and discovered. We may therefore have brought into being the information rich world anticipated by Simon, but the question of how to equitably and effectively distribute attention across this body of knowledge remains complex and lies at the heart of what is somewhat inelegantly known as discoverability.

Discoverability has for this reason become a matter of concern across the scholarly communication system. For researchers and research funders, the need to have research found and to be able to find relevant research is vital to the advancement of knowledge. For librarians and publishers, a key part of the services they now provide to scholars is the provision of infrastructures to make research discoverable and the tools to facilitate the integration of research outputs into these infrastructures. What then should researchers be aware of when thinking about discoverability?

To some extent, discoverability is a technological problem. The vast majority of all literature searches are now carried out through online search systems and in the UK most searches begin with an internet search engine, such as Google Scholar,^[v] a trend that appears even stronger amongst early career researchers.^[vi] A consequence of this change in search practices, as anyone who has used these tools will recognize, is that they incorporate a wider cross section of the scholarly literature. The results are therefore more diverse and less likely to be confined to a single journal, or discipline. This kind of searching places greater emphasis on the content of specific articles, as opposed to the journals they sit within. The metadata, or digital tags, which describe this content and allow it to be sorted into relevant bodies of knowledge have as such become increasingly important. Furthermore, advances in machine reading combined with a move towards html formatted articles and more permissive licensing agreements, notably the CC-BY license associated with open access publishing, is allowing the very language and construction of articles to be analyzed by computers for the purposes of discovery. The ability of search engines to function in this way is partly related to the integration of publishers’ websites and practices into these systems. However, it is becoming incumbent on researchers to reassess their own use of language when writing papers and devising metadata, such as keywords and abstracts.

This may seem a simple task, but it is important to consider how language and search practices interact. One way to approach this, to borrow Tony Becher’s phrase, is to think of research disciplines as tribes. Each academic tribe has their own disciplinary fields they tend to and develop, frontiers and new ideas that they explore and rival territories that they can expand into. Becoming an academic requires you to position yourself within a tribe and to learn its language and geography, which in turn shapes the way you search for and assess the relative value of a research paper. Using specific language in titles, abstracts and keywords, can therefore situate your research squarely in one disciplinary field, but simultaneously it can exclude it from interacting with emerging research frontiers, or from crossing into the territories of other academic tribes. Reflecting on these boundaries between different academic tribes, or even groups outside of academia, is therefore vital to producing research and metadata that can make your work visible and relevant to specific audiences. To not do so, could be seen as the digital equivalent of deliberately putting your book on the wrong shelf in a library and expecting it to be found.

A second strand of discoverability lies in utilising different types of metadata and expanding the pathways that lead to research. To give a handful of examples from what is a rapidly growing sector,^[vii] services such as Orcid provide users with a unique digital identifier, which allows different research outputs to be linked to a single searchable identity. Figshare enables the publication of supplemental material associated with research, such as datasets, or video, that would not normally be supported by traditional research publications. In the field of regional studies, an important development could be JournalMap, which enables research to be geolocated, thereby allowing searches to be made by location, rather than subject matter. These services all provide research objects with a broader profile and create new pathways through which research can be discovered and accessed. They effectively expand the networks of weak links associated with a piece of research and at the same time provide a feedback loop of information about these links and potential new audiences to researchers. However, they are also to some extent passive, in that they require searchers to look for research in particular places or in particular ways.

For this reason discoverability is not just a technological problem of enabling links to be formed between researchers and searchers, it is also a social issue of creating and nurturing relationships. Writing in the 1940s, the sociologist Robert Merton described academic research as being founded on four social norms: communism (in the sense of shared ownership of knowledge), disinterestedness, universalism and organized scepticism. From this perspective research is seen as a distinctly social endeavour and based on close reciprocal exchanges of knowledge and experience between interested parties. The goal of facilitating these strong links is increasingly being taken up by academic social media platforms such as Mendeley, ResearchGate and Academia.edu and collaborative publishing platforms like f1000. However, they continue to be predominantly produced through the traditional activities of academic associations and learned societies, which through their work in organizing journal publication and conferences, provide a social context for the creation of strong links between academics. Returning to the idea of attention, it is often within these networks that ideas can be developed and disseminated effectively, creating an attentive core audience that then reaches out into wider networks.

Taken as a whole, the concept of discoverability can be seen to reflect a shift towards increasingly using automated tools for finding and assessing the relevance of research in a domain that has traditionally been dominated by academic peer review. How this change might impact research practices is only beginning to be felt and should not be viewed as merely a technological development. Whereas, the benefits of these systems to make large amounts of information easily comprehensible and accessible should be acknowledged, so too should the more problematic issues inherent to discoverability. In particular one question raised by these systems is whether they incentivize garnering attention for research as an end in itself. This may seem unlikely, but then one might reflect on how the term ‘refable’ (referring the UK’s Research Excellence Framework) for research papers has become common in UK academia, indicating in a fairly unambiguous way how supposedly disinterested research can become entangled in regimes of assessment. In an academy where demonstrating the reach and impact of research becomes increasingly important for career progression, it is not impossible that a type of research that is short term, superficial and attention seeking, or what has been branded ‘Trump’ academia, might become privileged.^[viii] A related issue comes from the introduction of new commercial entrants into scholarly communication. Here, the extent to which corporate interests might distort scholarly communications is made apparent in backlashes to developments, such as Academia.edu’s proposal to allow its users to pay for recommendations for their research.^[ix] However, less overt, but more concerning might be the way that companies seek to monetise aspects of the attentional economy by selling the metadata they collect through these services back to governments and academic institutions. In this instance, the old adage that if the product is free then you are the product, may be of particular relevance to academics, as the information they freely give to these organizations becomes the basis for new forms of performance management in academia. This is not to say that such outcomes are inevitable. However, developing alternatives may require academics to become increasingly organized and engaged in constructing new forms of scholarly communication, rather than sleepwalking towards an uncertain future.

Michael K Taster

If you enjoyed reading this you might find these of interest:

References:

^[i] Simon, H. A. (1971), ‘Designing Organizations for an Information-Rich World’, in Greenberger, M. Computers, Communication and the Public Interest, Baltimore: John Hopkins Press.

^[ii] Ware, M. & Mabe, M. (2015) The STM Report, An overview of scientific and scholarly journal publishing. Available online: https://www.stm-assoc.org/2015_02_20_STM_Report_2015.pdf

^[iii] ibid

^[iv] Remler, D. (2014), Are 90% of academic papers really never cited, reviewing the literature on academic citations, LSE Impact Blog, Available online: https://blogs.lse.ac.uk/impactofsocialsciences/2014/04/23/academic-papers-citation-rates-remler/; Harrison, J. (2017), Can you publish and still perish? A question of impact. Taylor & Francis Author Services Blog, Available online: https://authorservices.taylorandfrancis.com/can-you-publish-and-still-perish/

^[v] Wolff, C. et al. (2015), UK Survey of Academics 2015, Ithaka S+R, JISC, RLUK. Available Online: https://doi.org/10.18665/sr.282736

^[vi] Nicholas, D. et al. (2017), Where and how early career researchers find scholarly information, Learned Publishing, 30(1), pp.19-29.

^[vii] Well documented by the 101 Innovations in Scholarly Communications project: https://101innovations.wordpress.com

^[viii] Morrish, L. (2016), The Rise of the Trump Academic, The Sociological Review Blog, Available online: https://www.thesociologicalreview.com/the-rise-of-the-trump-academic.html

^[ix] Ruff, C. (2016), Scholars Criticise Academia.edu Proposal to Charge Authors for Recommendations, The Chronicle of Higher Education, Available Online: https://www.chronicle.com/article/Scholars-Criticize/235102