20% of scientific papers on genes contain gene name conversion errors caused by Excel

Dave W. Shanahan

Excel Office 2016

According to three scientists, Mark Ziemann, Yotam Eren, and Assam El-Osta, Microsoft Excel has trouble converting gene names. In the scientific article, titled “Gene name errors are widespread in the scientific literature,” article’s abstract section, the scientists explain:

“The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.”

It’s easy to see why Excel might have problems with certain gene names when you see the “gene symbols” that the scientists use as examples:

“For example, gene symbols such as SEPT2 (Septin 2) and MARCH1 [Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase] are converted by default to ‘2-Sep’ and ‘1-Mar’, respectively. Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’). Since that report, we have uncovered further instances where gene symbols were converted to dates in supplementary data of recently published papers (e.g. ‘SEPT2’ converted to ‘2006/09/02’). This suggests that gene name errors continue to be a problem in supplementary files accompanying articles. Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused. Our aim here is to raise awareness of the problem.”

These scientists didn’t have to write a scientific paper on the problems that Microsoft Excel causes. An easier fix would be “to raise awareness of the problem” via Excel UserVoice or reach out to the Excel team on Twitter for a faster response. It is a bit alarming that 20% of scientific papers have errors due to Excel, but it’s even more confusing that scientists don’t try to figure out a way to solve the problem. This latest scientific paper is not the first of its kind, as a Bing search can easily reveal.

If you are interested in reading their full scientific paper, go here. Let us know in the comments if you think this is something Microsoft needs to address in Excel.