Blog

Data Mining of Tables: The Barrier for Automation

By Sarah Michaels

Phuse US Connect 2022 Abstract – Accepted for Presentation

Currently, the statistical analysis outputs are validated manually by SAS programmers and biostatisticians. There are two main reasons for that:

The process of generating and validating statistical analysis outputs is not standardised. Specifications and definitions are written in ‘Word’ documents and ‘Excel’ files.
The outputs of the statistical analysis are static and considered flat files (PDFs, RTFs, HTMLs, etc.). Metadata is missing, and the information regarding the data hierarchy within and between the outputs does not exist. The files are made for human review, not for software analysis.

In order to develop any type of automation, an automated solution for converting static files into machine readable formats, should be developed.

To achieve that, we must follow these steps:

Parsing: converting raw files into an abstract structure.
Utilizing information contained in the files, such as abbreviations, clinical terms, synonyms, etc.
Classification of headers identifies column and row headers automatically.
Cells characteristics: adding the metadata information to each cell.

As a result, the information of every cell resides in a structure database, and we can finally benefit from it.

Table mining process example: There is no information about the cells in the static output.

The examples below show how the software automatically identifies the hierarchy of cells.

Static Output

Post Table Mining Process

In our presentation we will describe the challenges and the implications of implementing such a process.

Share this post:

More Resources

All resources

Case Study: How Phastar Uses Verify to Accelerate Clinical Data Review

Accelerating Clinical Data Review: Addressing Fragmentation, Improving Collaboration, and Reducing Review Cycle Times by 35% Download PDF: Phastar-Beaconcure Case Study Industry Challenge: Disconnected Review Workflows, Lack of Automation, and Lengthy Data Review Cycles Clinical data analysis review remains a critical, yet often fragmented, element of the clinical trial process. Many organizations still rely on manual […]

Read Post

Beaconcure Podcast: Clinical Trial Innovation – Spotlight on Quality, Review, and Collaboration

In this video podcast, Névine Zariffa, leading voice in biometrics and data science, and Beaconcure Co-Founder and COO Ilan Carmeli, discuss the future of technology in improving clinical trial efficiency and quality during statistical analysis review. Névine and Ilan discuss some of the challenges reviewers currently face using traditional review processes, and Ilan demonstrates new […]

Read Post

Blog

The Future of Statistical Computing: Environment Platforms

While an “ideal” Statistical Computing Environment (SCE) is highly sought for in the pharmaceutical industry, one is yet to be developed. To encourage tech business leaders to develop such a platform, experts from top global pharma companies have compiled their requirements for the ideal SCE. These requirements were published in December 2021 in a white […]

Read Post

Blog

Benefits of Automating Statistical Analysis for Clinical Studies

The Current Status Statistical analysis output validation takes time and effort but does not guarantee a high-quality deliverable. Errors are still likely to go uncorrected. When submitting a study to the regulator, the expectation is that the output validation tasks -which are lengthy and expensive timewise – will be done with impeccable accuracy. The validation […]

Read Post