Excel is a lovely tool which I guess we all use for our small private data overviews and analysis. It's been around for so long that most of us by default reach for "the spreadsheet" whenever we need to keep track of numbers in rows and columns.
Nothing wrong with that. In general. Calculate a number and move on. But “quick and easy” can easily become “slow and clunky” when the task at hand is more complex like analysis in pre-clinical drug discovery.
Copy/paste errors and FAIRness issues
The problems arise when Excel is being used as the data management platform and maybe even as the storage solution as files in folders in a file system "that made sense when we created it".
There are the obvious issues like:
- data entry and copy/paste errors
- data analysis mistakes as the data didn't fully fit the template.
- missing FAIR'ness issues as only the producer can find the files, the meta data are not aligned (and change a bit every time!)
- the files and the data in them can often only be used by one person - the producer.
And sometimes Excel is also used to support longer, complex workflows where the user really has to be careful to avoid making any copy/paste mistakes as well as ensure to link the relevant data to the relevant compound, plate, bacteria, cell, etc.
Faster from hypothesis to answer
In these cases the risk of making mistakes is high - but often the users are skilled, trained and very diligent - so here the real issue from a company perspective (on top of the FAIRness) is the time it takes to process the data manually in Excel.
Drug discovery - and biopharma research in general - is a constant circle of hypothesis, production (of modalities or data) and verify/reject the hypothesis. Repeat.
All in order to identify the best candidate to meet the defined target product profile - i.e., with the best-perceived change of success in becoming a future drug on the market.
Hence, speed is important. The faster one can move from hypothesis to answer the faster a new cycle can be initiated and hopefully in the end the faster a drug candidate can be found and a new drug can be introduced to the market.
It’s easy to find examples of how much money "a drug faster to market" can mean. I will leave with: Faster can also mean a lot in terms of $$.
Therefore any process/speed optimisations in the data production pipeline are very welcome (even needed) and any steps where data are being manually processed as part of a standard workflow should in general be avoided. As a welcomed bi-product the data will often also be stored in a more FAIR manner if processed in a (semi) automatic fashion by a data management platform or tool.
Curve fitting can be a very manual process - or not
As a concrete example, let's look at a classic plate based in vitro curve fit experiment being conducted in so many labs around the world both in the biopharma industry and academia.
It's so simple: You feed a plate to a reader, receive the output reader file and would like to end with a sigmoidal conc. - response curve where you can read the IC50!
But how do you get from the reader file to the curves? And how do you handle all the steps in between? Manually in Excel?
The fact is - as we all know - there are many missing details and steps in between.
- What is the plate layout?
- In what wells are the controls?
- What's the concentration of the controls and the compounds?
- How do we normalise the data?
It's not until you have a nice list of conc:normalised value sets that you can start to perform a curve fit.
Using Excel to support all these steps is possible, but that doesn’t mean it’s a good idea. It will require a lot of manual copy-paste work with the risk of making mistakes in the process. By using a data management tool that supports this specific workflow you can skip a lot of the manual steps and get to the finish line faster - verify or reject. Next!
Speed is of the essence in pre-clinical drug discovery, so make sure you use tools that are built with that in mind.