Mastering Coronary Artery Disease Concept Map

David Thompson, PhDDavid Thompson, PhD
May 13, 2026
21 min read
Mastering Coronary Artery Disease Concept Map

Monday morning, a cardiology lead sends a “quick CAD list” with a dozen diagnosis codes. By Wednesday, the file has three tabs, mixed vocabularies, local notes in the margins, and no clear boundary between phenotype logic and implementation shortcuts. That is usually the point where a coronary artery disease concept map stops being a clinical reference and starts becoming a data engineering problem.

A coronary artery disease concept map should be built as a versioned asset your team can query and regenerate, not as a spreadsheet that drifts every time someone adds a code by hand. In OMOP work, that distinction matters. CAD rarely stays confined to one diagnosis domain. It touches condition concepts, procedures, drug exposures, lab measurement context, and the relationships that tie them together across standardized vocabularies.

For engineers new to OMOP, the hard part is not finding a few CAD codes. The hard part is deciding which concepts belong because of clinical intent, which belong because of vocabulary structure, and which should stay out even if they look related. A clear clinical phenotype definition in OMOP terms gives you that boundary. OMOPHub then lets you build the map through APIs, inspect relationship paths, and keep the result under source control like any other production artifact.

That approach changes how teams work. Instead of emailing revisions to a static list, you can trace each concept back to a seed, a relationship, a vocabulary constraint, and a review decision. It is easier to validate, easier to diff between releases, and much easier to explain when cohort counts shift after a vocabulary update.

CAD is common enough, and broad enough, that loose concept management causes real downstream problems. The cost is usually not obvious on day one. It shows up later as unexplained cohort inflation, condition records that do not align with procedure history, or drug concepts pulled in from a medication list that was never normalized upstream.

Why Your CAD Phenotype Needs a Programmatic Concept Map

A CAD phenotype built from a spreadsheet usually fails in predictable ways. It drifts out of date, nobody can tell which concepts were included by clinical intent versus convenience, and source-code mappings leak into logic that should have been standardized upstream. The result isn't just messy implementation. It's inconsistent cohorts.

A programmatic concept map fixes that by turning your phenotype into a graph with rules. Instead of saying “include these codes,” you define a seed concept, traverse known relationships, constrain by vocabulary and standard status, and export a reproducible artifact for downstream use. That's the difference between a one-off code list and an executable phenotype.

What static lists get wrong

Static lists are usually assembled from mixed sources. One tab may hold ICD-10-CM diagnoses, another may contain SNOMED findings, and a third may list drug names with no vocabulary normalization at all. Midway through implementation, the team realizes they're mixing source concepts and standard concepts in the same set.

That causes three practical failures:

  • Maintenance breaks first: hardcoded lists don't age well when vocabularies update or when a team revises CAD inclusion logic.
  • Traceability disappears: you can't easily explain why a concept is present, what relationship justified it, or when it entered the set.
  • Validation becomes manual: analysts spot odd cohort members late because there's no machine-readable semantic structure behind the phenotype.

Practical rule: If your phenotype can't be regenerated from code, it can't be trusted over time.

What a programmatic map gives you instead

A good coronary artery disease concept map acts like a controlled semantic layer for several workflows at once. ETL uses it to normalize source terms. Analysts use it to define cohorts. QA uses it to catch mappings that don't belong. Researchers use it to explain the phenotype in a way that survives peer review and handoff.

That's also why “phenotype” should be treated as more than a reporting label. If you need a useful refresher on that framing, OMOPHub's explanation of what phenotype means is a good reference point for the operational side of the term.

A practical CAD map usually includes more than the diagnosis node itself. It may connect to pathophysiology such as atherosclerosis, clinical manifestations such as angina, interventions such as PCI or CABG, and treatment domains such as statins or beta-blockers. Once represented as relationships instead of disconnected lists, those pieces become testable.

The engineering trade-off

There is a trade-off. Programmatic concept mapping takes more upfront discipline than opening Excel and pasting code lists together. You need naming conventions, version control, and a repeatable process for relationship traversal and filtering. But that investment pays back quickly because every later use becomes easier.

Use a spreadsheet if you're brainstorming. Don't use one as your system of record.

Setting Up Your OMOPHub Environment

The fastest way to make this work in practice is to start with a small script that can authenticate, search a concept, and print structured results. Once that works, the rest of the build becomes an extension of the same pattern.

A modern silver laptop displaying a welcome screen surrounded by vibrant blue and purple watercolor paint splashes.

Start with the API key and SDK

Generate an API key from your account dashboard first. Keep it in an environment variable rather than embedding it in notebooks or scripts. That sounds basic, but a surprising amount of vocabulary work gets shared around as notebooks with secrets copied into cells.

If you're working in Python, install the SDK from the OMOPHub Python client repository. If your team is R-heavy, use the OMOPHub R client repository. The same underlying workflow applies in either language.

A minimal Python setup looks like this:

import os
from omophub import OMOPHub

client = OMOPHub(api_key=os.environ["OMOPHUB_API_KEY"])

results = client.search_concepts(q="CAD")
for concept in results[:5]:
    print(concept)

Use the SDK examples in the OMOPHub docs and LLM reference as your source of truth when you wire up calls. Don't rely on memory for method names. SDK signatures change less often than ad hoc examples copied between projects, but they still deserve a quick check.

What to verify before you write real logic

Before expanding into relationship traversal, verify four things in your environment:

  1. Authentication works: make one successful search call and inspect the response object.
  2. Your runtime handles pagination: concept searches and relationship calls can return more than you expect.
  3. You can filter cleanly: make sure your code can narrow by vocabulary and standard concept flags.
  4. You know where versions live: vocabulary version awareness should be designed in early, not added after the first disagreement in counts.

A simple readiness checklist helps:

CheckWhy it matters
API key from env varKeeps secrets out of notebooks and repos
SDK installed in project envAvoids local/global package confusion
Search call returns structured dataConfirms auth and basic connectivity
Logging enabledMakes debugging traversal logic much easier

Don't start by building the full CAD graph. Start by making one search call reliable and inspectable.

Common setup mistakes

The most common implementation mistake at this stage isn't clinical. It's procedural. Engineers often jump straight to “find all CAD-related concepts” before they've looked at a single response body. That usually leads to assumptions about fields, concept classes, or relationship labels that don't hold up.

Another mistake is treating the SDK as optional and writing raw HTTP calls immediately. Raw calls are fine when you need them, but the SDK usually gives you a cleaner, safer path to proof of concept. Get the phenotype logic right first. Optimize call patterns later.

Anchoring Your Map with Seed Concepts

Every useful coronary artery disease concept map starts with a stable anchor. If the seed concept is too broad, the graph bloats and your cohort gets noisy. If it's too narrow, you miss clinically relevant descendants and source mappings. Good work here saves pain later.

The practical starting point is a search for CAD across standardized vocabularies. The methodology commonly begins with the REST endpoint /concepts/search?q=CAD to retrieve SNOMED CT and ICD-10-CM mappings, with typical response latency under 50 ms. A related pitfall is incomplete descendant expansion. Missing include_descendants=true can leave out 20 to 30% of ICD-10 I25 subcodes, which is why teams often rely on SDK helpers such as omop_hub.get_concept_descendants(concept_id) when building production logic.

Don't choose your anchor by text match alone

A text search is only the first pass. “CAD” can return abbreviations, related disorders, non-standard concepts, and source-specific labels that look plausible but don't belong as your canonical root. The job here is to separate search convenience from phenotype semantics.

Use the API to search, then filter for:

  • Vocabulary relevance: usually SNOMED CT for the standard clinical disease concept, with ICD-10-CM as an important source mapping domain.
  • Concept status: prioritize standard concepts where the OMOP model expects them.
  • Clinical scope: ensure the concept means coronary artery disease, not a neighboring construct like chest pain or acute coronary syndrome.

A simple exploratory pattern in Python:

results = client.search_concepts(q="Coronary artery disease")

for concept in results:
    if concept.get("vocabulary_id") in {"SNOMED", "ICD10CM"}:
        print(
            concept.get("concept_id"),
            concept.get("concept_name"),
            concept.get("vocabulary_id"),
            concept.get("standard_concept")
        )

Use the browser tool before you lock the code

Before you commit a seed concept into your repository, inspect it interactively in the OMOPHub Concept Lookup tool. This is especially useful when the text label looks obvious but the surrounding metadata tells a different story. You can quickly spot whether a concept is standard, whether it has useful descendants, and whether it sits where you expect in the vocabulary hierarchy.

For engineers new to OMOP, I also recommend reading OMOPHub's guide to OMOP concept mapping. It helps clarify why concept selection isn't just search-and-paste work.

A practical seed strategy

In most implementations, I'd anchor the map on the standard SNOMED disease concept and treat ICD-10-CM I25.* as a source-side bridge for ingestion and validation. That gives you one semantic center and one operational translation layer. It also keeps your downstream logic cleaner because you're not asking source vocabularies to behave like canonical phenotype definitions.

A useful pattern looks like this:

Role in mapRecommended use
Standard disease anchorSNOMED CAD concept
Source ingestion layerICD-10-CM I25.* family
Related clinical expansionDescendants and semantic relationships
Validation targetsKnown mappings and expected domains

Start with one seed concept you can defend clinically and technically. Add breadth through relationships, not guesswork.

The gotcha most teams hit

The biggest early mistake is grabbing every result that “sounds like CAD” and calling the collection a concept map. That's not a map. It's a pile. A proper map needs an anchor and typed edges. Without that, you can't explain inclusion logic or reconstruct the phenotype when someone asks why a record was counted.

That discipline also makes review easier. A clinician can challenge the seed and the relationships separately. That's a much better conversation than reviewing a thousand-row spreadsheet with no rationale column.

Building the Network Through Semantic Relationships

A single concept doesn't give you enough structure for strong phenotype logic. The map becomes useful when you traverse outward through semantic relationships and build a constrained network around the disease. That's where diagnosis, pathophysiology, manifestations, and outcomes begin to connect.

A conceptual digital illustration of hands cupping a glowing, interconnected network sphere representing complex medical data structures.

A strong reason to do this programmatically is data consistency. A 2025 OHDSI study of 10 federated networks involving 50 million patients found that unstandardized CAD definitions caused 25 to 40% phenotype misclassification across EHRs, inflating incidence estimates by up to 15%, as described in the Concept Map document on Scribd. Even if your immediate use case is one institution, that finding captures a familiar reality. Manual definitions drift fast.

Relationship types worth traversing

Not every relationship belongs in every CAD graph. A useful engineering pattern is to decide upfront which relationship families matter for your use case and ignore the rest until there's a reason to expand.

For CAD, these relationship categories are often the most practical:

  • Causal or etiologic links: useful for connecting CAD with atherosclerosis.
  • Symptom or manifestation links: useful when you need associated findings such as angina.
  • Outcome links: useful for downstream analyses involving myocardial infarction and related events.
  • Hierarchical links: useful for subtype expansion and descendant handling.

The endpoint pattern often used for this is /concepts/{concept_id}/relationships?relationship_types=has_cause,has_symptom. In some implementations, that allows you to tie CAD to pathophysiology such as atherosclerosis, risk factors such as hypertension and hypercholesterolemia, and outcomes such as myocardial infarction.

A retrieval pattern that scales

Don't recursively walk every available edge from the start. That produces noisy graphs and extra review work. Traverse in rings.

First, fetch direct relationships from the seed. Then review the returned concept families. Only after that should you recurse into selected branches such as manifestations or outcomes. This keeps the graph clinically interpretable.

A simplified Python pattern:

seed_id = 128670006  # example CAD seed if confirmed in your environment

rels = client.get_concept_relationships(
    concept_id=seed_id,
    relationship_types=["has_cause", "has_symptom"]
)

edges = []
nodes = {seed_id}

for rel in rels:
    target = rel.get("target_concept_id")
    edges.append({
        "source": seed_id,
        "target": target,
        "type": rel.get("relationship_type")
    })
    nodes.add(target)

print(f"nodes={len(nodes)} edges={len(edges)}")

Once you have this, add filters before recursion:

  • keep only target vocabularies you trust for the task
  • exclude non-standard concepts when building the analytic core
  • separate “candidate” concepts from “approved” concepts in your output

What works and what doesn't

The pattern that works is controlled traversal with review gates. The pattern that fails is breadth-first expansion with no semantic boundaries. If you recurse blindly, you'll pull in concepts that are technically related but analytically wrong for your cohort.

Here's the trade-off in plain terms:

ApproachWhat happens
Blind traversal of all relationshipsFast graph growth, poor specificity, hard review
Filtered traversal by relationship typeSmaller graph, better semantics, easier QA
Standard concepts only in core setCleaner analytics, fewer source-system surprises
Mixed standard and source concepts in one setConfusing ETL behavior and duplicate logic

A concept map should reflect clinical meaning, not API enthusiasm.

Distinguish graph layers

One trick that saves a lot of later cleanup is to store your graph in layers:

  1. Core disease layer for the canonical CAD concept and descendants.
  2. Clinical context layer for symptoms, causes, and sequelae.
  3. Operational layer for source vocabulary mappings used in ETL.

This keeps your analytics phenotype from becoming inseparable from ingestion logic. Analysts usually want the first layer and selected parts of the second. ETL often needs all three, but for different reasons.

A review habit worth adopting

After every traversal iteration, export a small edge list and read it like a clinician would. If the graph can't be explained in a short sentence per edge type, it's probably too broad. Engineers often trust structure too quickly. Clinical concept maps still need human sense-checking.

Mapping CAD Treatments, Labs, and Procedures

A CAD map that only includes diagnosis concepts is useful, but incomplete. Real-world analytic work usually asks more than “who has CAD?” It asks who was treated, what monitoring they received, and what interventions followed. That requires crossing vocabulary domains deliberately.

An educational concept map outlining treatments, procedures, and laboratory tests for managing Coronary Artery Disease.

A practical management-oriented build often starts by fetching pharmacotherapy concepts such as metoprolol (RxNorm 3558) and linking them to procedures such as PCI (SNOMED 75770000) through API relationships. One common pitfall is that ingredient-level RxNorm aggregation can miss 15 to 25% of combination therapies. Hierarchical rollups are the usual fix, as noted in the MSD Manual overview of coronary artery disease.

Medications first, but not as a flat list

Medication mapping often starts with a well-intentioned list of drug names. That's fine for brainstorming. It isn't enough for durable phenotype logic. The issue is granularity. Ingredient names, branded products, combinations, and class-derived descendants each serve different use cases.

If your analytic question is broad CAD management, build your medication branch with hierarchy in mind. If your question is narrow exposure analysis, stay closer to the ingredient or product level.

A practical approach:

  • Use ingredients for explainability: they're easier to review clinically.
  • Use rollups for completeness: they catch combinations and descendants that plain text lists miss.
  • Keep class logic separate: “beta blocker” and “statin” are useful abstractions, but they should be traceable to actual concepts.

If you want a vocabulary-specific refresher while working on the drug branch, OMOPHub's RxNorm code lookup article is a useful companion.

Procedures need a different mindset

Procedures don't behave like disease hierarchies. They're closer to interventions with context, coding variation, and workflow dependence. For CAD, PCI and CABG are the obvious anchors, but you still need to decide whether your map is aiming for phenotype enrichment, care pathway analysis, or intervention-based cohort segmentation.

That's why I recommend keeping procedure concepts in a sibling branch rather than mixing them directly into the disease core. They're related to CAD management, but they do not define CAD itself.

A clean pattern is:

management = client.get_related_concepts(
    concept_id=seed_id,
    rel_types=["treated_by", "prevents"]
)

for concept in management:
    print(
        concept.get("concept_name"),
        concept.get("vocabulary_id"),
        concept.get("concept_code")
    )

Then split the result into medication and procedure collections before review.

Labs are supporting structure, not just decoration

Laboratory concepts often get neglected because they aren't always part of the narrow phenotype definition. That's a mistake if the map is intended for analytics beyond prevalence counting. LOINC concepts for lipids and cardiac markers can provide important measurement branches for characterization and outcome studies.

Use labs to answer questions like:

Lab branch useWhy it matters
Baseline risk characterizationSupports cohort description and stratification
Monitoring patternsHelps analyze follow-up intensity
Event contextHelps tie diagnoses to surrounding clinical evidence

The key is restraint. Don't add every cardiovascular lab just because it exists. Add the ones your downstream analyses will use.

Keep “disease definition” separate from “clinical context.” Both belong in the map, but they shouldn't live in the same bucket.

JSON export for downstream tools

Once you've built medication, lab, and procedure branches, serialize them into a structure your downstream tools can consume. JSON is usually the easiest neutral format. Include concept identifiers, names, vocabulary IDs, relationship types, and a local status flag such as approved, candidate, or excluded.

That approval state matters. Many organizations don't need fewer candidate concepts. They need a cleaner promotion path from discovered concept to production concept set.

From API to Actionable Output Visualization and Validation

A CAD graph sitting in memory is still a draft. The asset becomes useful when engineers can inspect it, diff it across vocabulary releases, and run it against incoming OMOP records as part of the pipeline.

A digital tablet displaying a heart health dashboard with vital signs like BPM and SPO2, surrounded by abstract art.

For CAD, bad concept logic creates expensive downstream work. As noted earlier, this phenotype affects a large population and shows up across diagnoses, procedures, medications, and follow-up measurements. If the map shifts between builds, cohort counts, quality metrics, and model features shift with it.

Export for review and for code

Keep two outputs from the same OMOPHub build.

One output is for people reviewing the graph. A CSV node table and edge list usually work better than a screenshot because clinicians, analysts, and governance reviewers can filter, sort, and comment on specific concepts. Teams often load that file into Gephi or Cytoscape for a quick structural check, then use the flat files during approval.

The second output is the one your services and ETL jobs read directly. JSON is usually the right format because it preserves hierarchy, relationship metadata, and local workflow fields without forcing a second transformation step.

A useful JSON payload usually includes:

  • nodes with concept_id, concept_name, vocabulary_id, domain_id, standard_concept, status
  • edges with source_concept_id, target_concept_id, relationship_id, traversal_rule
  • metadata with phenotype_name, vocabulary_release, build_timestamp, commit_sha, builder_version

That commit_sha matters. If an analyst asks why the CAD denominator changed last week, you need to tie the exported concept map to an exact build, not a folder named final_v2.

Use the concept map inside the ETL

The strongest pattern is to treat the concept map as executable reference data. Load the approved CAD artifact during the ETL or normalization step and validate mapped concepts against it before they reach downstream marts.

In practice, that means the pipeline can:

  1. verify that a diagnosis normalized to a standard concept in the approved CAD disease layer
  2. confirm that revascularization procedures land in the expected procedure branch
  3. reject or quarantine concepts that matched source text rules but fall outside the approved graph
  4. log candidate concepts for review instead of dropping them without notification

This changes where errors surface. Engineers catch mapping drift during ingestion, not after a clinician notices an implausible dashboard trend.

Validation checks that hold up in production

Validation pointCheck
Source diagnosis mappingDoes the mapped standard concept exist in the approved CAD layer or its allowed descendants?
Procedure mappingDoes the concept fall inside the approved intervention branch?
Drug exposure normalizationDid the mapping resolve to the treatment branch you approved, not a nearby cardiovascular concept?
Unexpected conceptsShould the concept move to candidate review, or should the ETL exclude it outright?

The best phenotype documentation runs as code. If the map cannot validate incoming records, it is only partial documentation.

Visual checks catch mistakes faster

Graph visualization is a QA tool, not a presentation artifact. A quick render will often expose problems that look harmless in a table, especially after automated relationship traversal from seed concepts.

Look for orphan nodes, cross-domain contamination, and branches that grew far beyond the intended phenotype boundary. I also check node color by domain and edge color by relationship type because it makes two common CAD mistakes obvious. Procedure concepts creeping into the disease core, and broad ischemic heart disease concepts getting mixed with a narrower coronary artery disease definition.

Keep the review lightweight and repeatable. Export, render, inspect, fix the traversal rule, rebuild. That loop is much safer than hand-editing a spreadsheet after the fact.

Expert Tips for Building Production-Ready Concept Maps

A CAD concept map should behave like any other maintained data asset. Another engineer should be able to pull the repo, run the build against the same OMOP vocabulary release, and get the same result. If that does not happen, the problem is usually not clinical logic. It is missing metadata, unclear approval state, or API assumptions that were never written down.

CAD definitions also shift across coding systems and over time. The NCBI Bookshelf review of coronary artery disease makes that history clear. Tie every exported concept set to a known vocabulary release so historical cohorts stay interpretable and rebuilds do not drift without notice.

Tips that hold up in production

  • Version the build inputs, not just the output: keep the vocabulary release, API query parameters, traversal rules, and build timestamp with the exported concept set. Check the OMOPHub documentation for the current version-handling patterns your pipeline should record.
  • Treat discovered concepts as candidates until reviewed: API expansion is useful, but automatic inclusion creates avoidable phenotype drift. Keep an approved layer separate from a candidate layer and require an explicit promotion step.
  • Code defensively against vocabulary edge cases: relationship calls can return empty arrays, deprecated concepts, domain surprises, or non-standard concepts that still look plausible. Handle those cases in the client so the build fails clearly, or quarantines the concept with a reason code.
  • Keep artifacts separated by purpose: disease-definition concepts, treatment concepts, procedure concepts, lab concepts, and source-to-standard mappings should be exportable on their own. That separation makes review faster and prevents one branch from changing another without notice.
  • Bias toward a narrower first release: teams new to OMOP vocabularies often over-expand early because the API makes concept traversal easy. A smaller map with clear inclusion rules is easier to validate, easier to diff in Git, and safer to put behind downstream cohort logic.

One practical warning from production work. Large concept graphs fail when nobody owns the boundary rules. A developer pulls descendants from a broad ischemic heart disease node, another adds treatment classes, and within a sprint the map starts including concepts that are clinically adjacent but out of scope for CAD.

Set branch limits deliberately. Document why a relationship type is allowed. Review graph growth at each vocabulary refresh.

If the map lives only as a spreadsheet export, that discipline usually slips. If it lives as code plus API calls plus review states, drift is easier to catch before it reaches ETL or cohort generation.

If you're tired of maintaining fragile vocabulary spreadsheets and want a cleaner way to build, traverse, and version clinical concept maps, OMOPHub is worth a look. It gives healthcare teams API-first access to OMOP standardized vocabularies, relationship traversal, and cross-vocabulary mappings without the overhead of standing up local ATHENA infrastructure.

Share: