Currently at DocNexus ยท Seattle, WA

Data Engineer
in healthcare.

5+ years building data infrastructure across US healthcare and finance. I work at the intersection of clinical intelligence, pharma, and engineering โ€” making sense of messy, high-stakes data.

๐Ÿ“ Seattle, WA
โฑ 5+ years
๐Ÿ”ฌ BigQuery ยท FastAPI ยท AWS
Skills

Tech Stack

Data Warehousing
BigQueryClickHouse CloudSQLRedshift
Backend & APIs
FastAPIPython RESTSQLAlchemy
Cloud & Infra
AWS EC2S3 GCPDocker
Pipelines
AirflowdbtBatch Python
Governance
OpenMetadataNeo4j HIPAAGDPR
Healthcare Data
NPI RegistryClinicalTrials.gov PubMedORCID
Work History

Experience

Current
DocNexus
Mar 2026 โ†’ Present
Software Engineer ยท Healthcare Commercial Intelligence ยท Seattle, WA
  • Built a multi-tier HCP-to-trial matching engine on BigQuery โ€” scoring across NPI, institution, location, and ML embedding similarity against ClinicalTrials.gov.
  • Designed a tiered ClinicalTrials.gov v2 API integration replacing imprecise keyword search, dramatically improving trial coverage across physician cohorts.
  • Architected HCP publication matching pipeline with multi-dimensional scoring (affiliation, MeSH, co-author overlap, journal affinity) across PubMed datasets.
  • Built FastAPI services for real-time clinical data lookup; managing EC2 batch pipelines for ophthalmology and oncology HCP datasets.
  • Delivering data products for pharma accounts including GSK/ViiV, Novartis, Takeda, and Chugai.
BigQueryClickHouseFastAPI AWS EC2ClinicalTrials v2 APIPubMedNPI
Cogneo Technologies
Jul 2024 โ†’ Mar 2026
Software Engineer (SDE-II) ยท US Finance & Compliance ยท Remote
  • Led data governance and quality frameworks for US financial domain clients โ€” metadata catalog, lineage graph, and automated quality checks.
  • Built compliance-focused pipelines with GDPR/CCPA controls using Airflow and graph-based data lineage with Neo4j.
  • Designed and maintained data dictionaries and audit trails for regulatory reporting across critical financial datasets.
  • Built Financial Risk Management system with AI using Amazon Bedrock and S3 vector, helping financial institutions proactively manage risk.
OpenMetadataAirflowNeo4j Amazon BedrockGDPR / CCPAPython
TechVariable
Aug 2022 โ†’ Jul 2024
Software Engineer ยท Healthcare & Interoperability ยท Guwahati, India
  • Built Revenue Cycle Management system for US Healthcare with analytics and predictions for Claims and Remits using Apache Kafka and AWS.
  • Developed Healthcare Interoperability Solution โ€” converting HL7 v2, CCD, FHIR and EDI X12 messages to a canonical JSON format and RDBMS.
  • Worked on event-driven architecture using Apache Kafka for real-time data streaming across healthcare data pipelines.
  • Collaborated with cross-functional teams managing task allocations, ETAs, and delivery of healthcare data products.
Apache KafkaAWSHL7 / FHIR PythonHealthcare StandardsEDI X12
Oriental Outsourcing
Jun 2021 โ†’ Jul 2022
Software Engineer ยท Full-Stack Development ยท Chandigarh, India
  • Handled multiple client projects end-to-end, including an e-commerce platform for the steel industry with custom pricing, off-cut reuse optimisation, and XML order flows.
  • Responsible for business logic generation, unit testing, debugging, and API development across full-stack applications.
  • Provided ETAs and task allocation coordination across client deliverables.
PythonMySQLREST APIs Full-StackDjango
Work Samples

Selected Projects

๐Ÿงฌ
Clinical Trials Matching Engine
Multi-tier HCP-to-trial scoring on BigQuery. Matches physicians using name, institution, location, and ML.DISTANCE embedding similarity. Runs in batch on EC2 across NPI cohorts.
BigQueryML.DISTANCEEC2
๐Ÿ“„
HCP Publication Matching
Multi-region BigQuery scoring (max 110 pts) linking physicians to PubMed publications via affiliation, MeSH, ORCID mapping, and co-author graph.
PubMedBigQueryORCID API
โšก
Clinical Data API Service
FastAPI service for real-time provider lookup and tiered trial search using Essie query syntax. Specialty-specific keyword mappings. Async BigQuery clients per GCP region.
FastAPIAsync PythonBigQuery
๐Ÿ›๏ธ
Data Governance Platform
End-to-end governance stack for US financial clients โ€” metadata catalog, lineage via Neo4j, automated quality checks, GDPR/CCPA compliance reporting.
OpenMetadataNeo4jAirflow
๐Ÿ›ก๏ธ
Financial Risk Management with AI
AI-powered system helping financial institutions proactively manage risk. Built with Amazon Bedrock for generative AI capabilities and S3 vector storage for semantic risk pattern detection.
Amazon BedrockS3 VectorPython
๐Ÿฅ
Revenue Cycle Management โ€” US Healthcare
End-to-end analytics and prediction pipeline for healthcare Claims and Remits, processing real-time billing data to identify revenue leakage and optimise reimbursement cycles.
Apache KafkaAWSHealthcare RCM
๐Ÿ”—
Healthcare Interoperability Solution
Converted HL7 v2, CCD, FHIR R4, and EDI X12 clinical messages into a unified canonical JSON format with RDBMS persistence โ€” enabling seamless data exchange across disparate healthcare systems.
HL7 / FHIREDI X12Python
๐Ÿ›’
E-Commerce: Steel Industry
Full-stack e-commerce platform for a steel distributor with custom pricing, off-cut reuse optimisation, delivery charge approval workflows, and XML-based B2B order integration.
PythonMySQLREST APIs
Motivation

What drives me

US Healthcare Complexity
Fragmented systems, high stakes, real patient impact. Exactly the kind of problem worth dedicating years to.
Clinical Intelligence
Connecting HCPs, trials, publications, and pharma signals into something actionable for the people who need it.
Infrastructure that lasts
Fast, reliable, and maintainable. I care about the long-term health of systems, not just shipping.
Governance & Trust
Data you can trace and defend โ€” especially when the stakes are regulatory and quality matters.