How to prevent your external CRO data from creating a mess

Keeping track of internal research data in a drug discovery organisation so that the data is stored in accessible formats and named so that others than the data producer can find, understand and use it can be challenging enough.

Now, multiply that by the fact that most pharma and biotech companies also rely on external data from often multiple CROs that play by their own rules regarding data discipline and tools, and it becomes really difficult to keep the house in order.

Smaller pharma companies often don't have the resources in-house, so they find relevant external collaborators or subcontractors - contract research organisations (CROs) - to perform the work and data production for them.

The big(ger) pharma companies could likely - if they wanted to - produce all the data in-house. But they choose not to for classic outsourcing arguments; keeping headcount down, economy of scale, etc., or because the CRO has a unique capability.

So, big or small, the reality of modern biopharma research is a mix of internal and externally produced experiments and data.

But what is the problem with managing external data?

The problem with managing external data often comes down to delivery and formatting.

Typically data is delivered from the CRO's as Excel sheets or - even worse - as PowerPoint or PDF reports.

In most cases, the process starts with Excel sheets being emailed between the parties or in the “advanced” cases uploaded to a SharePoint site where the receiving pharma company can download the data files.

The pharma company will then internally have to spend time and resources on all the post-processing work of QC’ing the data and terminologies as well as uploading the data to the relevant internal data stores.

That is a lot of extra work and internal resources spent on data that the companies paid others to handle.

But experience tells us that it has to be done to ensure that the data quality is as expected, that the data is aligned with internal vocabularies and units, and that you catch typos, text in number columns (like N/A), data in the wrong columns, etc.

If not, the externally produced data cannot be stored and used for analysis together with the internal data.

How do you create a better process?

The easiest way would be for the CRO to load data directly into the internal data warehouse to make the process as efficient as possible, but this is in reality often not a viable solution.

Many organisations want to perform their own data quality checks, so there will always be a need for internal QC - and on top of that, no one is willing to let external partners into critical internal systems.

Therefore, the most common solutions are, as mentioned, that the CRO either uploads to a folder or SharePoint site, or simply sends via email.

When we can't take the manual QC out of the equation, it's all about having as smooth processes as possible, so that:

The CRO can deliver in an agreed format.
The pharma company can easily perform the internal QC.
The pharma company can easily transfer data to internal systems.

The solution: Data templates and an external data warehouse

The best way to get data delivered in a workable format is to use data exchange templates with agreed columns and terms. Describing "What are you going to deliver and in what format".

These data templates can be as simple as Excel sheets with agreed number and order of columns and column headers. Some of the columns can be fitted with dropdowns with controlled vocabulary of agreed terms to ensure aligned naming and spelling as well as some simple data checks.

The CRO can then load these data templates directly into a data warehouse that lives on the outside of the pharma company's firewall. It acts as a staging area that does not jeopardise data security, and where scientists or the project management can easily review and quality assure the data.

Once the data is quality controlled, it can easily be pushed to the internal systems.

So, create a pre-database - a "CRO data store" - on the outside of the pharma company firewall and then have an automated or semi-automated sync process from "CRO data store" to "internal pharma data warehouse"

This way, you retain control of the data and replace the handheld collection and customisation of data with a structured and efficient process where data can be easily quality assured and subsequently stored together with internal data.

What can you achieve?

There are several obvious benefits to optimising this process.

Speed

You cut down on the manual hours needed to align data structure and perform quality control, so you reach your goals faster.

Resources

You use fewer internal resources on a task that you had outsourced to avoid handling internally.

Quality

You increase data quality.

And last but certainly not least: You gain the ability to use and compare internal and external data no matter where or by whom they are produced at a speed that mimics the internally produced data.