PubMed is a popular search engine for biomedical literature. It has lost a lot of ground to Google Scholar over the past few years, but for a long time it was the go-to scientific search engine for psychologists, neuroscientists, and the likes. And the cool thing is that, unlike Google Scholar, PubMed allows you to write scripts to automatically download enormous amounts of information.
Which is what I did over the weekend: I downloaded information about scientific articles. Names, authors, abstracts (summaries), journal titles, etc. And lots of it. I figured I would eventually get banned for abusing the PubMed service, but I didn't, and the end result is a database containing 257.535 articles published between 1950 and 2010 in 43 academic journals, broadly focused on neuroscience and cognitive psychology . To the extent that PubMed has a complete index, this should include a large proportion of all articles published between those years in those journals.
So that's a lot of data!
I'm planning to write a series of blog posts, each time focusing on a different aspect of this data set. My main aim will be to understand the whole system academic publishing just a little bit better, and to see how it has evolved over the years. All the while keeping in mind, of course, that even these quarter of a million articles reflect just a tiny fraction of the total volume of scientific output. And a biased fraction at that, because the journals have been hand-picked. By me.
For this first post, I'll start with the basics. Just how much papers are being published each year? Let's take a look:
Figure 1. The number of published articles per year. The grey lines indicate individual journals.
Wow! In 2010 alone, 14.544 articles have been published across 39 journals (the blue line, which has been scaled down by a factor of 10 to make it fit in the graph). And it's a fair bet that in 2011 this number will be even higher, because scientific output has been rising constantly and explosively over the past decades.
There is a caveat though: The dataset includes a lot of new journals, but hardly any old journals that have gone out of existence. This is partly because, like the number of articles, the number of journals has increased. But also because I wouldn't know about a journal if it has gone out of existence a few decades ago. In other words, my cherry-picking induces a bias, which may cause the growth to appear larger than it really is.
So, just to be sure that the increase in science-mass is real, I also plotted the average number of articles per journal (the orange line). Obviously this is a gross underestimation of the overall increase, because it does not take into account that the number of journals has also increased. But even the average number of articles per journal has doubled over the past two decades.
Whether this prodigious growth in scientific output is a good thing is open to debate. Personally, I'm a little sceptical, because more than anything else it reflects the pressure on scientists to publish lots of articles. University libraries, which are the primary consumers, will buy pretty much all scientific literature by default. So we have a) scientists that want to publish as much as possible (to gain a good reputation), b) academic publisher than want to sell as much as possible (to make money), and c) totally uncritical consumers that will buy anything that is thrown their way. All the ingredients for a vicious circle, in other words, which explains the more-or-less exponentially growing science-mass.
On to the next statistic: The number of authors per article.
Figure 2. The number of authors per article.
As you can see, the mode, or the most common number of authors is two. This is because articles are often written by young researchers, under supervision of a senior researcher. Single author papers are relatively rare, because researchers that can act alone are typically senior. And seniors typically don't want to do everything by themselves.
The graph above shows the overall statistics across 60 years. But have there been changes in the constitution of the typical academic team?
Figure 3. The size of academic teams across the years.
Yes! Teams are growing. The conventional junior-senior formation is rapidly losing ground to larger teams of researchers. (All lines are rising, of course, but this is due to the overall increase in science-mass. Here we focus on relative differences.) Presumably, this has to do with the fact that a lot of the new science-mass comes from huge biomedical laboratories that produce papers with enormous numbers of authors. (The term 'author' should obviously be taken with a grain of salt. In these cases, authorship is used as an acknowledgement for someone's contribution to the research programme.) The largest number of authors on a single paper in my database is 158!
But we're just getting started! Stay tuned for more insights into the world academic publishing.
1. I downloaded information about the following journals from 1950 to 2010 (names as in PubMed): Acta Psychologica, Attention, perception & psychophysics, Behavior research methods, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc, Behavioral neuroscience, Behavioural brain research, Brain and cognition, Brain research, Cereb cortex, Cognition, Cognitive neuropsychology, Cognitive Psychology, Current biology : CB, Exp Brain Res, J Neurosci, Journal of Cognitive Neuroscience, Journal of experimental psychology. General, Journal of experimental psychology. Human perception and performance, Journal of experimental psychology. Learning, memory, and cognition, Journal of Neurophysiology, Journal of Vision, Memory & cognition, Nature reviews. Neuroscience, NeuroImage, Neuron, Neuropsychologia, Neuroreport, Neuroscience letters, Perception, Perception & Psychophysics, PLoS Biology, Psychological review, Psychological Science, Psychonomic bulletin & review, Seeing and perceiving, Spatial Vision, The Behavioral and brain sciences, The European journal of neuroscience, The quarterly journal of experimental psychology, Trends in Cognitive Sciences, Trends in neurosciences, Vision Research, Visual Cognition
A separate query per year and journal was executed, like so: (("Journal of Vision"[Journal]) AND "2010"[Date - Publication] : "2010"[Date - Publication])