Blog

Parsing and Cell Reading

By Sarah Michaels

Parsing is the process of analyzing a string of symbols, either in natural language, computer languages, or data structures, confronting the rules of formal grammar. Beaconcure’s take revolves around analyzing clinical data structures that appear in tables and were created with SAS programming.

Parsing is the way of the algorithm to take raw clinical tables, and put them into a formed structure. It enables the cells in the original table to be placed in their appropriate location. This is the basis for any further analysis of the data, and only after a successful parsing process, certain calculations can be applied to it.

When creating tables using SAS programming, it’s important to highlight that they can be generated in different structures and file formats. They can be done, among others, in PDF, RTF or HTML. However, each format requires a different parsing technique, and consequently have their own challenges.

If you know a little about this, you might think “there are some excellent open-source tools available out there, why not use that?”

The answer is that Beaconcure’s parser produced much more accurate results due to our expertise in clinical tables.

For example, in comparison to Amazon’s parser, which is widely used in the tech industry, there is a substantial difference in quality that the clinical trials field cannot risk.

parser phase 3 — Image 2: Amazon’s Parsing Technology – highlighted in yellow the discrepancies found in the document

One might look at a table and see a complete structured dataset, but not all that appears clear to the human eye is necessarily easily understood by an algorithm. There are many types and forms of tables that Beaconcure analyzes, and our sophisticated parsing algorithms are equipped with handling varying structures and data, in all types of studies.

It’s important to mention that, if the original tables are not organized in a standard way, the parsing can’t reflect the data correctly. This can result in unsatisfying results for the calculations, as accurate as they may be. For that reason, maintaining a unified structure between tables is very important, so any discrepancy that might exist in the data can be located fast and efficiently.

Share this post:

More Resources

All resources

Case Study: How Phastar Uses Verify to Accelerate Clinical Data Review

Accelerating Clinical Data Review: Addressing Fragmentation, Improving Collaboration, and Reducing Review Cycle Times by 35% Download PDF: Phastar-Beaconcure Case Study Industry Challenge: Disconnected Review Workflows, Lack of Automation, and Lengthy Data Review Cycles Clinical data analysis review remains a critical, yet often fragmented, element of the clinical trial process. Many organizations still rely on manual […]

Read Post

Beaconcure Podcast: Clinical Trial Innovation – Spotlight on Quality, Review, and Collaboration

In this video podcast, Névine Zariffa, leading voice in biometrics and data science, and Beaconcure Co-Founder and COO Ilan Carmeli, discuss the future of technology in improving clinical trial efficiency and quality during statistical analysis review. Névine and Ilan discuss some of the challenges reviewers currently face using traditional review processes, and Ilan demonstrates new […]

Read Post

Blog

The Future of Statistical Computing: Environment Platforms

While an “ideal” Statistical Computing Environment (SCE) is highly sought for in the pharmaceutical industry, one is yet to be developed. To encourage tech business leaders to develop such a platform, experts from top global pharma companies have compiled their requirements for the ideal SCE. These requirements were published in December 2021 in a white […]

Read Post

Blog

Benefits of Automating Statistical Analysis for Clinical Studies

The Current Status Statistical analysis output validation takes time and effort but does not guarantee a high-quality deliverable. Errors are still likely to go uncorrected. When submitting a study to the regulator, the expectation is that the output validation tasks -which are lengthy and expensive timewise – will be done with impeccable accuracy. The validation […]

Read Post