Unlock FHIR Epic: Master OMOP ETL Integration

Dr. Rachel GreenDr. Rachel Green
April 8, 2026
18 min read
Unlock FHIR Epic: Master OMOP ETL Integration

You usually hit this problem after the exciting part is over.

You already got Epic access approved. You can authenticate, call a few FHIR endpoints, and pull back valid JSON. Then the main work starts. The analytics team wants OMOP tables, the research team wants reproducible cohorts, and your first pass at the data shows the usual gap between interoperable exchange and analytically standardized data.

That gap is why fhir epic projects often stall after the demo phase. Extracting a Patient or Observation resource is straightforward. Turning those resources into a reliable OMOP CDM pipeline is not. The hard part is translation, especially vocabulary translation.

Bridging the Gap from Epic EHR to OMOP Analytics

Epic on FHIR matters because it gives teams a modern access layer into a major EHR. Epic describes its FHIR approach as a core part of healthcare interoperability, built on the HL7 FHIR standard and aligned with USCDI data classes such as demographics, observations, and medications through its developer platform at fhir.epic.com.

That matters operationally, but it does not solve analytics by itself.

A research-ready OMOP environment expects standardized structures, standardized concepts, and enough lineage to explain how each source field became an OMOP record. Epic FHIR gives you transport and structure. OMOP requires semantic normalization. Those are different jobs.

Where projects usually break

Many teams underestimate the middle layer.

They assume the path looks like this:

  1. Call Epic FHIR API
  2. Parse JSON
  3. Load OMOP tables

The full path is longer:

  • Extract discrete resources: Patient, Observation, Condition, MedicationRequest, AllergyIntolerance, Appointment
  • Stage raw payloads: Preserve source JSON and identifiers
  • Normalize source fields: Units, timestamps, performer context, encounter references
  • Resolve vocabulary mappings: Convert source codes into OMOP standard concepts
  • Load target tables: PERSON, MEASUREMENT, CONDITION_OCCURRENCE, DRUG_EXPOSURE, and others
  • Audit lineage: Keep enough traceability to rerun or defend each mapping decision

That is why a general engineering guide like ETL with Python is still useful in this domain. The orchestration patterns are familiar. The healthcare-specific burden comes from terminology, provenance, and compliance.

Tip: Build your Epic-to-OMOP pipeline as two products, not one. Product one extracts and stages valid FHIR data. Product two maps and loads standardized OMOP facts. When teams merge those concerns too early, debugging becomes painful.

Epic source models are not your analytics model

Another trap is assuming Epic FHIR replaces every native Epic data source. It does not.

FHIR is excellent for interoperable access and many operational workflows. For historical reporting, local finance logic, and some encounter-level detail, teams still compare against other Epic exports and warehouse patterns. If you work in a mixed Epic estate, it helps to understand how the native reporting model differs from the API view. This overview of the Epic Clarity data model is a good companion for that comparison.

The practical takeaway is simple. fhir epic is the cleanest door into EHR data for many integration and secondary use cases, but OMOP success depends on what you do after the API call.

How Epic Implements FHIR APIs

Epic’s FHIR server is easiest to understand if you treat it like a structured library.

Each resource is a shelf. Each search parameter is an an index. SMART on FHIR is the security desk that decides who gets in, under what context, and with which permissions.

A professional man interacting with digital holographic data visualization featuring FHIR API and Epic System integration elements.

The core pieces that matter in practice

Epic supports DSTU2, STU3, and R4, with ongoing expansion, according to Epic’s technical specifications at open.epic.com/TechnicalSpecifications. For implementers, that means two things. First, you need to confirm the version and supported interactions on the specific target environment. Second, your parser and transformation code should not assume every installation behaves identically.

The resource set many teams touch first is predictable:

  • Patient: demographics and identifiers
  • Observation: vitals, labs, and many measurement-like facts
  • Condition: diagnoses and problem-list style data
  • MedicationRequest: ordered medications
  • AllergyIntolerance: allergy and reaction data
  • Appointment: scheduling and operational workflows

If you need a refresher on the broader mechanics of endpoint design and FHIR interaction patterns, this overview of the FHIR API is useful before you wire a production extractor.

SMART on FHIR is more than login

Many teams describe SMART on FHIR as “OAuth for Epic,” which is directionally right but operationally incomplete.

SMART on FHIR governs launch context, scopes, and the user or system identity behind a request. That context changes what your app can do. An embedded app launched in clinician context behaves differently from a backend ETL service running system-to-system. If you design your extract process as if all access modes are interchangeable, you usually end up rewriting auth and permission logic later.

For ETL, the main question is this: are you building a user-context app, a backend service, or both? Answer that early.

Real-time APIs and bulk export serve different jobs

Teams often start with resource-by-resource GET requests because they are easy to test.

That is fine for:

  • chart enrichment
  • point-of-care apps
  • patient-facing workflows
  • limited incremental extraction

It is not always the right pattern for large historical pulls. Epic also supports Bulk Data Access, often called Flat FHIR, for large-scale export scenarios. Use that when your job is population extraction rather than transaction-time retrieval.

A useful dividing line is simple:

PatternBest fitWeak fit
Real-time resource APICurrent patient context, incremental refreshes, embedded appsLarge backfills
Bulk Data AccessPopulation exports, historical loads, analytic refreshesInteractive chart workflows

Search behavior affects ETL design

One technical detail from Epic’s implementation matters more than many teams realize. Epic documents a post-filtering search mechanism for resources such as Observation. The server first retrieves results based on native search parameters, then applies additional post-filters, which improves query efficiency and reduces payload size in supported cases on open.epic.com/TechnicalSpecifications.

That sounds minor. It is not.

If your ETL depends on tightly scoped searches, you need to know which parameters are native, which are post-filters, and how each installation exposes them. This affects page size, latency, retry logic, and whether you over-fetch records and filter them downstream.

A short walkthrough helps if you are building or reviewing launch-based integrations:

Tip: Always validate against the specific Epic instance’s CapabilityStatement before finalizing search logic. The docs show what Epic supports broadly. The instance tells you what your integration can use.

Practical Integration Scenarios and Data Flows

The best way to evaluate fhir epic is to look at the data flow, not the endpoint list.

A system can expose excellent APIs and still create a weak downstream implementation if you choose the wrong pattern. In practice, teams usually land in one of two lanes. They either need real-time EHR enrichment, or they need an ETL path into a research and analytics model such as OMOP.

Infographic

Scenario one with real-time chart enrichment

A clinician opens a patient chart. An embedded app launches in context, requests the patient identifier and session context, then queries resources such as Observation or AllergyIntolerance.

This pattern works when your app needs the latest state. It can support care guidance, surfacing recent labs, or flagging an allergy before an order workflow proceeds.

The operational improvement is obvious because people stop chasing fragmented sources. Epic-centered FHIR workflows provide real-time, secure access to discrete patient data and help break silos across systems. One industry summary notes that Epic’s EHR footprint covers patient records globally, and that automating data exchange can reduce administrative workflow time compared with manual methods in some implementations, as described in this review of Epic on FHIR integration in healthcare.

What does not work well here is using a chart-time app to become your enterprise extraction engine. The moment you need large backfills, reconciliation, or standardized cohort building, this architecture starts bending in the wrong direction.

Scenario two with Epic FHIR to OMOP ETL

This is the pattern most analytics teams need.

Take a blood pressure observation. In Epic FHIR, it may arrive as an Observation with code, value, unit, effective time, subject reference, encounter reference, and possibly nested components. That is interoperable. It is not yet OMOP-ready.

A practical ETL flow looks like this:

  1. Extract from FHIR: Pull Observation resources by patient or date window.
  2. Stage raw content: Store original JSON and response metadata.
  3. Flatten fields: Separate systolic and diastolic values if present as components.
  4. Normalize units and timestamps: Convert units consistently and standardize datetimes.
  5. Map vocabularies: Resolve source codes to OMOP standard concepts.
  6. Load target rows: Insert into MEASUREMENT with source values and mapped concepts.
  7. Reconcile and audit: Check row counts, duplicates, and concept coverage.

Here, integration architecture choices matter. If you want a good mental model for extractor, transformer, and adapter boundaries, this piece on Integration Design Patterns is worth reviewing. Healthcare pipelines benefit from those same separation-of-concern principles.

What works and what usually fails

The strongest implementations treat extraction and standardization as separate layers.

What works:

  • Raw staging first: Keep the original FHIR payloads for replay and audit.
  • Incremental windows: Pull by stable timestamps or transaction boundaries.
  • Vocabulary service abstraction: Hide concept resolution behind one interface.
  • Idempotent loads: Re-running a job should not create duplicate OMOP facts.

What tends to fail:

  • Direct JSON-to-OMOP inserts: Too brittle for real data variation.
  • Inline hardcoded mappings: Fine for demos, painful in production.
  • Ignoring unit normalization: Especially risky for vitals and labs.
  • Assuming one Observation equals one MEASUREMENT row: Components, panels, and custom structures complicate that.

Key takeaway: If your target is OMOP, the pipeline is not complete when Epic returns valid FHIR. It is complete when every loaded row has reproducible semantic meaning.

Querying the Epic FHIR API with Code Examples

Many teams start with Patient and Observation. That is the right place to start because those two resources expose the usual edge cases quickly: identifiers, pagination, missing elements, date filters, and code systems.

Authentication pattern to expect

Epic uses OAuth 2.0 in SMART-aligned workflows. The exact launch and token flow depends on whether you are building an embedded app or a backend service, but the extractor logic after you obtain an access token is familiar HTTP.

In practice, keep these pieces configurable:

  • FHIR base URL
  • Access token source
  • FHIR version expectations
  • Client timeouts and retry policy
  • Per-resource pagination handling

Do not scatter these concerns across scripts. Put them behind one API client.

Fetching a patient resource

The simplest useful request is a read by patient ID.

import requests

FHIR_BASE = "https://your-epic-fhir-base-url"
ACCESS_TOKEN = "YOUR_ACCESS_TOKEN"
PATIENT_ID = "example-patient-id"

headers = {
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Accept": "application/fhir+json"
}

url = f"{FHIR_BASE}/Patient/{PATIENT_ID}"
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()

patient = response.json()

print(patient["resourceType"])
print(patient.get("id"))
print(patient.get("gender"))
print(patient.get("birthDate"))

A few production notes matter here.

First, do not assume a single MRN-style identifier appears in a fixed array position. Epic environments can expose multiple identifiers with different systems. Build identifier selection rules explicitly.

Second, preserve the original resource body. When someone asks later why an OMOP person_source_value looks odd, the raw FHIR payload answers faster than reconstructed logs.

Searching for observations

For ETL, you usually search rather than read a single resource.

import requests

FHIR_BASE = "https://your-epic-fhir-base-url"
ACCESS_TOKEN = "YOUR_ACCESS_TOKEN"
PATIENT_ID = "example-patient-id"

headers = {
    "Authorization": f"Bearer {ACCESS_TOKEN}",
    "Accept": "application/fhir+json"
}

params = {
    "patient": PATIENT_ID,
    "_count": 50
}

url = f"{FHIR_BASE}/Observation"
response = requests.get(url, headers=headers, params=params, timeout=30)
response.raise_for_status()

bundle = response.json()

for entry in bundle.get("entry", []):
    resource = entry.get("resource", {})
    code = resource.get("code", {})
    print(resource.get("id"), code)

This is enough for sandbox testing. It is not enough for a production extractor.

Handling pagination correctly

FHIR search results often arrive as a Bundle with paging links. Always follow link relations rather than reconstructing the next URL yourself.

import requests

def fetch_all_observations(fhir_base, access_token, patient_id):
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Accept": "application/fhir+json"
    }

    next_url = f"{fhir_base}/Observation"
    params = {"patient": patient_id, "_count": 50}
    all_resources = []

    while next_url:
        response = requests.get(next_url, headers=headers, params=params, timeout=30)
        response.raise_for_status()
        bundle = response.json()

        for entry in bundle.get("entry", []):
            resource = entry.get("resource")
            if resource:
                all_resources.append(resource)

        next_link = None
        for link in bundle.get("link", []):
            if link.get("relation") == "next":
                next_link = link.get("url")
                break

        next_url = next_link
        params = None

    return all_resources

The line params = None after the first request is intentional. Once the server gives you a next-page URL, treat that URL as authoritative.

Pulling useful fields from Observation

A common first pass is to flatten a few key fields into a staging row.

def flatten_observation(obs):
    coding = ((obs.get("code") or {}).get("coding") or [])
    primary_code = coding[0] if coding else {}

    subject_ref = (obs.get("subject") or {}).get("reference")
    encounter_ref = (obs.get("encounter") or {}).get("reference")

    value_quantity = obs.get("valueQuantity") or {}
    effective_dt = obs.get("effectiveDateTime")

    return {
        "observation_id": obs.get("id"),
        "status": obs.get("status"),
        "code_system": primary_code.get("system"),
        "code": primary_code.get("code"),
        "display": primary_code.get("display"),
        "value": value_quantity.get("value"),
        "unit": value_quantity.get("unit"),
        "effective_datetime": effective_dt,
        "subject_reference": subject_ref,
        "encounter_reference": encounter_ref
    }

That works for simple observations. It does not cover components, interpretations, reference ranges, or non-quantity values. Build for the happy path first, but log every exception shape so your transform layer evolves from real payloads.

Error handling that saves time later

The most expensive bugs are silent failures.

Use a client wrapper that records:

  • request URL
  • parameters
  • response status
  • Epic correlation headers if available
  • resource type
  • extract window
  • retry count

Also separate transient failure handling from semantic failure handling. A timeout and a malformed code mapping are not the same class of error.

Tip: Before you optimize throughput, make your extractor replayable. In healthcare ETL, replay beats speed when you are debugging edge cases tied to PHI-safe logs and audited reruns.

The Vocabulary Mapping Challenge from Epic to OMOP

The API is the easy part.

The difficult part is proving that the code you extracted means the same thing as the concept you loaded into OMOP. That is where many fhir epic pipelines become fragile.

Why FHIR codes are not enough

FHIR gives you a structured coding envelope. A resource may carry a code system, code, display, and sometimes multiple codings. OMOP wants a standard concept strategy with consistent downstream semantics.

That creates several translation problems at once:

  • one source code may map cleanly
  • one source code may map ambiguously
  • a local or proprietary code may need custom logic
  • the code may be right but the unit or value representation may still block a valid OMOP load

Epic supports USCDI-oriented interoperability, but that does not mean every data element arrives in a form that drops directly into a standard analytics model.

A major example is flowsheet content and observation detail. A cited analysis notes that a significant portion of Observation codes do not align natively with ATHENA vocabularies, and that a substantial number of Epic-to-OMOP ETL questions in OHDSI discussions revolve around these vocabulary gaps, according to the summary linked from Epic FHIR Specifications.

The hidden work inside an Observation

An Observation can represent a lab result, a vital sign, a panel, a screening answer, or something custom. If you treat all of those as one class of record, your OMOP output degrades fast.

The problems usually show up in four places:

Problem areaWhat it looks like in Epic FHIRWhy OMOP load gets tricky
CodingMultiple codings or local extensionsNeed one standard target concept
StructurevalueQuantity vs component vs textDifferent transformation paths
UnitsNon-standard or local unit textMust normalize for valid analytics
ProvenanceEncounter and performer references varyNeed stable lineage and source fields

Common FHIR to OMOP targets

Here is a practical map for the resources teams use first.

FHIR ResourceOMOP CDM Target TableExample Data
PatientPERSONbirth date, sex, source identifiers
ObservationMEASUREMENT or OBSERVATIONblood pressure, lab values, survey responses
ConditionCONDITION_OCCURRENCEdiabetes diagnosis, active problem
MedicationRequestDRUG_EXPOSUREprescribed medication order
AllergyIntoleranceOBSERVATIONrecorded allergy or intolerance
AppointmentVISIT_OCCURRENCE or staging layerscheduled outpatient encounter context

That table is deliberately simple. In real pipelines, some rows branch based on coding, value type, or organizational conventions.

For deeper background on concept-crosswalk strategy, this guide to vocabulary concept maps is a useful reference.

What works better than manual spreadsheets

Teams often begin with a spreadsheet of “known mappings.” That can help during profiling. It should not become your long-term mapping engine.

Better patterns are:

  • Vocabulary lookups in code: Resolve concepts as part of the transform step
  • Version-aware mapping rules: Tie outputs to a vocabulary release
  • Exception queues: Route unmapped or ambiguous records for review
  • Unit normalization libraries or rule sets: Keep measurement conversion outside the SQL load script

Poor patterns include hidden CSVs on shared drives, undocumented one-off SQL joins, and transformations that drop unknown codes without review.

Tip: Separate “unmapped” from “invalid.” An unmapped code might still be clinically important and worth preserving in source fields. Invalid data usually needs a different remediation path.

Simplifying Vocabulary Mapping with OMOPHub

The reason many teams postpone OMOP standardization is not conceptual. It is operational.

Maintaining a local vocabulary stack takes work. You need ingestion jobs, release management, indexing, relationship traversal logic, and a stable interface for ETL code to query concepts consistently. If your project is already wrestling with Epic-specific payloads, that extra infrastructure becomes one more moving part.

A man smiling in front of a digital diagram depicting OMOPHub API and Athena Local integration.

A cleaner pattern for ETL teams

A simpler approach is to treat vocabulary resolution as an API-backed service inside your transform layer.

The workflow is straightforward:

  1. Extract a source code from Epic FHIR, such as a LOINC code from Observation.code.coding
  2. Query a vocabulary service for the corresponding OMOP concept
  3. Store both the source representation and the resolved standard concept in your staging and load logic

For quick checks during development, the web-based Concept Lookup tool is useful when you want to inspect a code manually before automating it.

Python example using the SDK

The Python SDK is available at omophub-python. A practical pattern is to wrap concept resolution in a helper so the rest of your ETL stays clean.

from omophub import OMOPHub

client = OMOPHub(api_key="YOUR_API_KEY")

def lookup_standard_loinc(source_code):
    results = client.search_concepts(
        query=source_code,
        vocabulary=["LOINC"]
    )

    if not results:
        return None

    for concept in results:
        if concept.get("concept_code") == source_code:
            return {
                "concept_id": concept.get("concept_id"),
                "concept_name": concept.get("concept_name"),
                "vocabulary_id": concept.get("vocabulary_id")
            }

    return {
        "concept_id": results[0].get("concept_id"),
        "concept_name": results[0].get("concept_name"),
        "vocabulary_id": results[0].get("vocabulary_id")
    }

mapped = lookup_standard_loinc("8480-6")
print(mapped)

If your team works in R, the companion package is available at omophub-R.

The point is not the exact helper function. The point is that your ETL code can ask one consistent service for concept resolution instead of embedding local vocabulary assumptions in multiple scripts.

Where this helps most in Epic projects

This model is especially useful when:

  • Observations include multiple coding styles: You can centralize the logic that chooses which coding to prefer.
  • Flowsheet-derived content needs review: You can route uncertain codes through a single lookup pipeline.
  • Vocabulary updates matter: Your ETL can pin or record vocabulary version context.
  • Multiple languages are in play: Python and R teams can use the same conceptual service boundary.

One practical benefit is speed of iteration. When clinical reviewers flag a questionable mapping, you can adjust one lookup layer rather than hunt through notebooks, SQL, and ad hoc scripts.

The broader engineering lesson is simple. Vocabulary resolution should behave like a maintained dependency, not tribal knowledge.

Conclusion and Production Best Practices

Epic FHIR gives you a clean, modern way to reach clinical data. OMOP gives you a disciplined way to analyze that data. The difficult work sits between them.

The teams that succeed do not treat this as a one-step export. They build a pipeline that preserves raw FHIR payloads, stages data cleanly, normalizes structures, resolves vocabularies deliberately, and loads OMOP with traceable lineage.

Production habits worth keeping

  • Validate each Epic environment separately: Check the instance CapabilityStatement before relying on search parameters or interactions.
  • Keep raw payloads: You will need them for replay, audits, and edge-case debugging.
  • Design idempotent loads: Reruns should repair jobs, not duplicate facts.
  • Separate extraction from mapping: Different failure modes need different monitoring.
  • Track vocabulary decisions: Record source code, chosen standard concept, and rule version.
  • Treat custom codes as first-class work: Do not hide them in “miscellaneous” buckets.
  • Build exception handling early: Unmapped, invalid, and delayed records need different paths.
  • Test with realistic Observation shapes: Components, panel results, and local extensions surface issues fast.

Key takeaway: A solid fhir epic pipeline is not the one that returns JSON fastest. It is the one that still makes semantic sense six months later when researchers rerun the cohort and ask how every concept was assigned.

Frequently Asked Questions

How should I handle Epic proprietary codes or custom extensions

Keep them in staging exactly as received. Preserve source code, system, display, extension path, and raw JSON. Then send those records through a separate review queue instead of forcing a premature OMOP mapping.

Should I use real-time FHIR APIs or Bulk Data Access for OMOP ETL

Use real-time resource queries for narrow incremental jobs, app workflows, and validation pulls. Use Bulk Data Access when the job is a large historical export or population-scale refresh. Many mature pipelines use both.

Where do unit conversions belong

Put unit normalization in the transform layer, not inside the final OMOP load statement. That keeps conversions testable and easier to audit.

What are the main security concerns

Treat all Epic-derived data as regulated PHI unless your governance team explicitly says otherwise. Minimize scope, log access carefully, encrypt in transit and at rest, and keep auditability across extraction, staging, mapping, and load steps.

How do I reduce mapping drift over time

Version your mapping logic, tie runs to vocabulary release context, and rerun validation queries after terminology updates. Drift is easier to catch when mappings are centrally managed and not scattered across scripts.


If your team is building an Epic FHIR to OMOP pipeline, OMOPHub can remove a major source of ETL friction by giving you developer-friendly access to OMOP vocabularies through APIs and SDKs. It is a practical way to replace local vocabulary infrastructure with a simpler lookup layer for concept search, relationship traversal, and mapping workflows.

Share: