Data Engineering & Analytics Platform
Corporate Support Functions › Information Technology · 17 L4 steps · 5 phases · 6 decision gates · Updated 2026-03-19 22:26
📊
Process Flow Diagram (BPMN)
📋
L4 Process Steps
| Step | Step Name | Role / Swim Lane | System | Input | Output | KPI | Dec? | Exc? |
|---|---|---|---|---|---|---|---|---|
Phase 1 1.1 |
Capture data source requirements from business | Data Product Manager | Confluence | Business analytics request or new system onboarding | Data source specification document with SLA and refresh frequency | Requirements captured within 5 business days of request; ≥90% completeness score | N | N |
| 1.2 | Assess source API availability and schema | Data Engineer | AWS Glue Data Catalog | Data source specification document | Schema discovery report; connector feasibility assessment | Schema discovery completed within 2 days; 100% of fields documented | Y | Y |
| 1.3 | Configure AWS Glue ETL connectors | Data Engineer | AWS Glue | Schema discovery report; source credentials in AWS Secrets Manager | Parameterised Glue job with incremental load logic | Connector build time ≤3 days; incremental load latency ≤15 min for streaming sources | N | Y |
| 1.4 | Ingest raw data to S3 landing zone | Data Engineer | AWS S3 | Source system data via Glue job or Kinesis Data Streams | Raw partitioned Parquet files in S3 landing zone with ingestion timestamp | Pipeline success rate ≥99.5% per month; P99 ingestion latency ≤30 min | N | Y |
Phase 2 2.1 |
Execute automated data quality checks | Data Quality Engineer | Monte Carlo | Raw data files in S3 landing zone | DQ scorecard with row counts, null rates, freshness, schema drift alerts | DQ check coverage ≥95% of all active datasets; anomaly detection P95 latency ≤10 min | Y | Y |
| 2.2 | Quarantine and triage DQ failures | Data Quality Engineer | Monte Carlo | DQ failure alert from Monte Carlo | Quarantine record in S3 rejected zone; Jira ticket auto-created for source team | P1 DQ incidents triaged within 1 hour; resolution SLA ≤4 hours for operational data domains | N | Y |
| 2.3 | Register and classify dataset in data catalog | Data Governance Analyst | Collibra Data Intelligence Cloud | Validated dataset; DQ scorecard | Catalog entry with business glossary linkage, data owner, lineage graph, and retention policy | 100% of production datasets catalogued within 1 sprint of go-live; lineage coverage ≥80% of governed domains | Y | N |
| 2.4 | Apply PII masking and access controls | Data Governance Analyst | Amazon Macie | Dataset classification from Collibra; GDPR/CCPA data subject inventory | Masked/tokenised dataset in governed S3 zone; IAM policy attached to dataset | PII detection coverage ≥99% of passenger-facing domains; zero unmasked PII in non-production environments | N | Y |
Phase 3 3.1 |
Develop dbt transformation models | Analytics Engineer | dbt Cloud | Governed raw data in S3; dimensional modelling spec from data product owner | dbt models (staging → intermediate → mart layers); compiled SQL lineage DAG | dbt model test pass rate ≥98%; model execution time within agreed SLA per domain (e.g. revenue mart ≤20 min) | N | Y |
| 3.2 | Run dbt model tests and schema validation | Analytics Engineer | dbt Cloud | dbt model compilation output | Test results report; row count reconciliation against source | Zero undetected primary-key duplicates in mart layer; row count variance ≤0.1% vs source | Y | Y |
| 3.3 | Load transformed data to Redshift data warehouse | Data Engineer | Amazon Redshift | Validated dbt mart tables in S3; COPY manifest | Populated Redshift dimensional schema with distribution and sort keys optimised | Load job SLA: ≤60 min for daily full refresh; query P90 response time ≤5 s on standard analyst workloads | N | Y |
Phase 4 4.1 |
Design and build Tableau analytics dashboards | BI Developer | Tableau Server | Redshift mart tables; stakeholder wireframe sign-off | Tableau workbook with certified data source and row-level security filters | Dashboard load time ≤8 s on standard extract; certified data source reuse rate ≥60% across dashboards | N | Y |
| 4.2 | Obtain stakeholder sign-off on dashboard accuracy | Data Product Manager | Confluence | Published Tableau dashboard (UAT environment); reconciliation against source system report | Signed acceptance document; dashboard promoted to production Tableau Server | UAT cycle ≤5 business days; zero P1 metric errors post-production release | Y | N |
| 4.3 | Publish self-service data products to analytics portal | Data Product Manager | AWS S3 / Redshift | Certified Tableau dashboards; approved dbt mart definitions | Published data product with SLA, owner, lineage, and Collibra catalog linkage | Data product NPS ≥7/10 in quarterly survey; product adoption ≥80% of target business unit within 30 days of launch | N | N |
Phase 5 5.1 |
Monitor pipeline health and SLA compliance | DataOps Engineer | AWS CloudWatch | Airflow DAG run metrics; Glue job logs; Redshift query performance insights | Real-time pipeline health dashboard; automated SLA breach alerts to PagerDuty | Pipeline SLA breach rate ≤0.5% per month; mean time to detect (MTTD) ≤5 min | Y | Y |
| 5.2 | Manage Redshift cluster capacity and cost governance | Cloud FinOps Analyst | AWS Cost Explorer | Monthly Redshift usage report; query advisor recommendations | Rightsizing recommendation; reserved instance purchase plan; cluster scaling action | Redshift cost per TB processed ≤$25; month-over-month cost variance ≤10%; idle cluster time ≤5% | N | Y |
| 5.3 | Execute platform incident response and root-cause analysis | DataOps Engineer | PagerDuty | SLA breach alert from CloudWatch; user-reported data outage | Incident resolution; post-mortem report in Confluence; pipeline fix deployed via CI/CD | MTTR for P1 data outages ≤2 hours; post-mortem completed within 48 hours; recurrence rate ≤10% for resolved incidents | N | Y |
📋