Tuesday, March 28, 2017

Is Academia Biased Against Stars?

David Card and Stefano DellaVigna recently came out with a working paper which says yes.

The (abbreviated) abstract:
We study editorial decision-making using anonymized submission data for four leading
economics journals... We match papers to the publication records of authors at the time of submission and to subsequent Google Scholar citations.... Empirically, we find that referee recommendations are strong predictors of citations, and that editors follow the recommendations quite closely. Holding constant the referees' evaluations, however, papers by highly-published authors get more citations, suggesting that referees impose a higher bar for these authors, or that prolific authors are over-cited. Editors only partially offset the referees' opinions, effectively discounting the citations of more prolific authors in their revise and resubmit decisions by up to 80%. To disentangle the two explanations for this discounting, we conduct a survey of specialists, asking them for their preferred relative citation counts for matched pairs of papers. The responses show no indication that prolific authors are over-cited and thus suggest that referees and editors seek to support less prolific authors.
The interpretation of course hinges on whether you think more famous authors are likely to get cited more conditional on quality. To some extent, this must be true. You can't cite a paper you don't know about, and you're almost certainly more likely to know about a famous author's paper. This is part of the reason I started blogging -- as a marketing tool.

In any case, here's a second recent paper courtesy Andrew Gelman. "Daniele Fanelli and colleagues examined more than 3,000 meta-analyses covering 22 scientific disciplines for multiple commonly discussed bias patterns. Studies reporting large effect sizes were associated with large standard errors and large numbers of citations to the study, and were more likely to be published in peer-reviewed journals than studies reporting small effect sizes... Large effect sizes were ...  associated with having ... authors who had at least one paper retracted." So, studies with large effects are cited more and more likely to be retracted. In general, to get published in a top journal, you almost have to make a bold claim. But of course bold claims are also more likely to be wrong or inflated, and also cited more. 

My prior on the situation is that I suspect that any time "quality" is difficult to judge, it is only natural and rational to look for signals of quality, even if blunt. I would rather be operated on by a surgeon with a degree from Harvard, drink a pricey bottle of wine if someone else is paying, and choose to eat in a restaurant which is full. I'd think twice before putting my money in a bank I've never heard of, or of flying on an unknown airline. These are all perfectly rational biases. The more difficult it is to infer quality, the more people are going to rely on signals. An example of this is lineman/linebackers in the NFL -- since stats for these players are difficult, all-star lineman tend to be those drafted in the first or second round. This is much less so for the skill positions, where performance metrics are easier to come by. In the case of academia, I believe it can be enormously difficult to judge the relative quality of research. How to compare papers on different topics using different methodologies? Add to this, top people might referee 50 papers a year. Neither they nor the editors have time to read papers in great detail. And referees frequently disagree with each other. I was recently told by an editor at one of the journals in the Card study that in several years editing she had never seen a paper in which all the referees unanimously recommended accept. If referees always agreed, it should happen in 5% of cases. (Unfortunately, the Card/DellaVigna paper doesn't provide information on the joint distribution of referee responses. This is too bad, because one of the big gaps between R&R decisions and citations they find is that papers in which one referee recommends reject and the other straight accept are usually not given an R&R, and yet are well-cited. The other thing they don't look at is institutional affiliation, but I digress...) This all speaks to the idea that judging research is extraordinarily difficult. If so, then citations and referee and editorial decisions are likely to rely, at least in part, on relatively blunt signals of quality such as past publication record and institutional affiliation. It would be irrational not to.

So, why did the authors go the other way? I didn't find their use of the survey convincing. I suspect it had to do with personal experiences. The average acceptance rate at the QJE is 4%. That means top people like David Card get good papers (and I believe many of his papers are, in fact, very good) rejected at top journals all the time, despite high citations. He has a keen analytical mind, so it's reasonable to conclude based on personal experience that the rejections despite great citation counts are the result of some kind of bias, and perhaps they are. I've once had a paper rejected 9 times. Of course, I don't believe it could possibly be the fault of the author. Much easier to believe the world is conspiring against me. Of course, I'm joking here, but the very low acceptance rates at top journals, combined with some atrocious accepted papers probably feed this perception by everyone. Personally, I'd still prefer to be an NBER member and in a top 10 department with a weekly seminar series in my field (all of which would increase citations) and a large research budget, but hey, that's just me.

Update: I was interested in how correlated referee recommendations are, so I followed the reference in the Card paper to Welch (2014). Here is the key table. Conditional on one referee recommending "Must Accept", the second referee has a 7.7% chance of also recommending "Must Accept", vs. a 3.8% unconditional probability. If perfectly correlated, then it should be 100%. Even with one "Must Accept", chances are still better than even that the second recommendation will be reject or neutral. So, it's much closer to being completely random than referees being in agreement. The lesson I draw from this is not to read too much into a single paper rejection, and thus wait before moving down in the ranks. In a way, this could bolster the arguments of Card and DellaVigna. Referee reports are so random, how can citations be a worse measure of quality? But I think this ignores the fact that referees will spend more time with a paper than potential citers will. Even if all authors read the papers they cite, how meticulously do they read the papers they don't cite? I think the difficulty of assessing paper quality once again implies that signals should play an important role, not to mention personal connections.



No comments:

Post a Comment

Note: Only a member of this blog may post a comment.