About
Sagis Diagnostics, a leading U.S. pathology lab, replaced its fragmented Azure SQL setup with a unified Databricks Lakehouse built by Dataforest. The migration consolidated 21 data sources, automated analytics, and ensured HIPAA compliance — delivering full data transparency, pay-per-use efficiency, and a ~50% reduction in compute costs.
Challenge, approach, and impact
Transform Legacy SQL Scripts into Functional Jobs
The challenge involved transforming static SQL scripts into functional jobs to improve automation and consistency across data workflows, enhancing scalability and reducing maintenance overhead.
Ensure Data Compliance in Databricks (Patient Data)
The challenge focused on ensuring strict compliance with HIPAA and healthcare data protection standards while processing sensitive patient data within the Databricks environment.
Achieve Full Observability Across Data Pipelines
The challenge aimed to achieve full observability across data pipelines, enabling visibility into data transformation, validation, and consumption to track issues, verify accuracy, and address blind spots.
Implementation Challenges During Migration
The challenge involved overcoming initial implementation challenges during migration, including the absence of clear documentation of the existing Azure setup, limited access rights, missing connectors, and adapting to Databricks platform updates.
Automated Databricks Job Conversion
The solution involved converting all legacy SQL scripts into automated Databricks jobs with error handling, scheduling, and integration into the business logic layer, delivering key deliverables like Delta Lake-based medallion architecture and a cost-monitoring dashboard.
Compliance-First Databricks Environment
The solution focused on maintaining HIPAA compliance by fully anonymizing patient data used for AI/BI and ML training within the Databricks environment.
Data Lineage and Monitoring Dashboards
The solution included implementing automated data lineage and monitoring dashboards within Databricks, offering real-time data refresh tracking, anomaly detection, and event-based alerts for enhanced transparency, troubleshooting, and data reliability.
Incremental Implementation and Knowledge Transfer
The solution involved incrementally reconstructing undocumented logic, standardizing access management, and implementing an adaptive update policy to synchronize with Databricks' releases, ensuring smooth handover, and maintainability.
How we built
Testimonials
Anonymous
Dataforest
“Dataforest got us off the ground really quickly, and they even provided documentation without us having to ask for it — that was really impressive.“
Team structure
Client team
Sagis Diagnostics
Project stakeholder
Project stakeholder
The client stakeholders at Sagis Diagnostics were working closely with the team at Dataforest
Agency team
1 x Project Manager
Governance
