Authored by: Bilal Babar & Rachel Howard
The conversation around synthetic data is rapidly gaining traction, but many still question whether it is a technical curiosity rather than a commercial opportunity. At Research Partnership, we think differently. Synthetic data is no longer just experimental, it’s becoming a strategic tool for agile decision-making in pharma. Our experience working across the spectrum of synthetic data – from basic stimuli creation to predictive simulation, has shown us its strategic potential to pharma.
In healthcare business intelligence, synthetic data refers to artificially generated information that mimics real-world patterns or conditions. It can serve many functions: it can expand the scope of existing datasets, simulate stakeholder behaviors, and improve how we design, test, and refine ideas.
Interestingly, as the field evolves, we’re noticing a subtle shift in terminology. “Simulated data” is increasingly used in place of “synthetic” by those seeking to avoid connotations of artificiality or inauthenticity. While “synthetic” remains widely used, this emerging language reflects a growing recognition that the value lies in how the data is constructed and applied, rather than in how it is generated.
We’ve already seen this principle in action with predictive eye-tracking: machine learning trained on large datasets of eye movements to estimate how people visually engage with new materials. Once considered experimental, this approach has moved into the mainstream, to a point where physical eye-tracking is no longer even required. Today, we can take it even further – not just predicting where people’s eyes will move, but how they’re likely to respond to questions we haven’t asked them, by combining large historical data sets with behavioral science heuristics, therapy area trend reports and large language models.
But not all synthetic data is created equal. Its value depends heavily on the quality of the underlying dataset, and the decision you’re trying to make.
We think about synthetic data as existing within a “corridor of truth” – a zone in which simulated insights can be useful, provided they stay close enough to the ground truth (accurate, factual, and verified information) to be trusted.
The question isn’t just “Can we simulate this?”, it’s “Should we trust the result, given the decision we need to make?”
If the use case is low stakes, such as exploratory ideation or early-stage inputs to inform creative development, then simulated data from synthetic respondents can be valuable to advance our thinking quickly and cost effectively. Especially when we’re interpolating from data we already have. Similarly, when the alternative is no primary data, as is often the case in the early commercial stage where forecasts rely heavily on secondary data and Key Opinion Leader perspectives alone, adding synthetic data from practicing physicians can represent a meaningful improvement on previous practices. But as the risk of making the wrong decision increases, and as our questions stretch beyond the boundaries of known data, the need for caution rises. Real data, coupled with expert validation, becomes essential to stay within a safe space of inference.
Where synthetic data becomes truly strategic is when it enhances robust, well-characterized real-world datasets. With enough high-quality inputs, models can simulate behaviors and stress-test assumptions. This can improve forecasting, inform scenario planning, and accelerate message optimization, provided we guard against overfitting or false certainty.
Hallucinations, where models generate answers that sound plausible but lack factual grounding, can reinforce existing assumptions or create echo chambers, especially when models are not retrained regularly. This becomes particularly risky when simulating complex or evolving stakeholder views. While models excel at generating coherent responses, they don’t understand in the human sense. Their outputs reflect statistical likelihoods drawn from training data rather than verified truths, which is why validation through comparison with hold-out data (a portion of a dataset that is intentionally excluded from the model training process and reserved for independent evaluation of the model’s performance) and subject matter expert review remains a critical step in any synthetic application.
As with any AI-driven method, we must also remain alert to the risk of embedded bias, especially around race, gender, geography, and cultural context. Synthetic models trained on unbalanced data may amplify existing inequities or overlook underrepresented perspectives. This is another reason human oversight is essential. We’re actively exploring bias detection and mitigation techniques to ensure simulations are inclusive and representative.
Ultimately, synthetic data is only as valuable as the decisions it informs. Our guiding principle is to align our use of synthetic data to the business context:
This corridor-based mindset helps us blend agility with rigor. We routinely assess each brief and audit the available data to determine where synthetic augmentation can safely and meaningfully add value.
Perhaps counterintuitively, synthetic data can also support ethical, patient-centric research practices. In rare disease, for instance, where patient populations are small and often overburdened, synthetic patient personas can be used to explore early hypotheses or test “quick questions” without repeatedly drawing on the same individuals. Synthetic patients can help reduce research fatigue and move research along by ensuring the versions of materials taken for testing are closer to ‘patient ready’. But they should never be mistaken for the authentic patient voice. Synthetic patients don’t care when you misunderstand them. Real ones will.
Synthetic data isn’t a substitute for primary market research. However, we have found ways it can be effectively employed to open up new ways of thinking and are excited to continue exploring its potential. If used responsibly and with expert oversight, we have found synthetic data can accelerate learning and enable earlier, more confident decision-making.
At Research Partnership, we evaluate each brief against this spectrum, auditing available evidence and advising on the most appropriate approach. Handled strategically, synthetic data sharpens research questions, encourages creativity in study design, and ultimately delivers more confident insights.
Contact us to schedule a meeting with one of our experts.
Jump to a slide with the slide dots.
Discover how agile, AI-powered qualitative research delivers rapid, strategic insights that drive smarter decisions in global healthcare.
Read moreDiscover how AI avatars are transforming healthcare market research with immersive stimuli, interactive simulations, and real-time insights for better
Read moreHarmonize pharma brand tracking to unify insights, boost agility, and drive strategic decisions across your entire portfolio.
Read moreRapport is our monthly newsletter where we share our latest expertise and experience.