CS-10 · Data Engineering & Analytics Platform

📊

Process Flow Diagram (BPMN)

📋

L4 Process Steps

Step	Step Name	Role / Swim Lane	System	Input	Output	KPI	Dec?	Exc?
Phase 1 1.1	Capture data source requirements from business	Data Product Manager	Confluence	Business analytics request or new system onboarding	Data source specification document with SLA and refresh frequency	Requirements captured within 5 business days of request; ≥90% completeness score	N	N
1.2	Assess source API availability and schema	Data Engineer	AWS Glue Data Catalog	Data source specification document	Schema discovery report; connector feasibility assessment	Schema discovery completed within 2 days; 100% of fields documented	Y	Y
1.3	Configure AWS Glue ETL connectors	Data Engineer	AWS Glue	Schema discovery report; source credentials in AWS Secrets Manager	Parameterised Glue job with incremental load logic	Connector build time ≤3 days; incremental load latency ≤15 min for streaming sources	N	Y
1.4	Ingest raw data to S3 landing zone	Data Engineer	AWS S3	Source system data via Glue job or Kinesis Data Streams	Raw partitioned Parquet files in S3 landing zone with ingestion timestamp	Pipeline success rate ≥99.5% per month; P99 ingestion latency ≤30 min	N	Y
Phase 2 2.1	Execute automated data quality checks	Data Quality Engineer	Monte Carlo	Raw data files in S3 landing zone	DQ scorecard with row counts, null rates, freshness, schema drift alerts	DQ check coverage ≥95% of all active datasets; anomaly detection P95 latency ≤10 min	Y	Y
2.2	Quarantine and triage DQ failures	Data Quality Engineer	Monte Carlo	DQ failure alert from Monte Carlo	Quarantine record in S3 rejected zone; Jira ticket auto-created for source team	P1 DQ incidents triaged within 1 hour; resolution SLA ≤4 hours for operational data domains	N	Y
2.3	Register and classify dataset in data catalog	Data Governance Analyst	Collibra Data Intelligence Cloud	Validated dataset; DQ scorecard	Catalog entry with business glossary linkage, data owner, lineage graph, and retention policy	100% of production datasets catalogued within 1 sprint of go-live; lineage coverage ≥80% of governed domains	Y	N
2.4	Apply PII masking and access controls	Data Governance Analyst	Amazon Macie	Dataset classification from Collibra; GDPR/CCPA data subject inventory	Masked/tokenised dataset in governed S3 zone; IAM policy attached to dataset	PII detection coverage ≥99% of passenger-facing domains; zero unmasked PII in non-production environments	N	Y
Phase 3 3.1	Develop dbt transformation models	Analytics Engineer	dbt Cloud	Governed raw data in S3; dimensional modelling spec from data product owner	dbt models (staging → intermediate → mart layers); compiled SQL lineage DAG	dbt model test pass rate ≥98%; model execution time within agreed SLA per domain (e.g. revenue mart ≤20 min)	N	Y
3.2	Run dbt model tests and schema validation	Analytics Engineer	dbt Cloud	dbt model compilation output	Test results report; row count reconciliation against source	Zero undetected primary-key duplicates in mart layer; row count variance ≤0.1% vs source	Y	Y
3.3	Load transformed data to Redshift data warehouse	Data Engineer	Amazon Redshift	Validated dbt mart tables in S3; COPY manifest	Populated Redshift dimensional schema with distribution and sort keys optimised	Load job SLA: ≤60 min for daily full refresh; query P90 response time ≤5 s on standard analyst workloads	N	Y
Phase 4 4.1	Design and build Tableau analytics dashboards	BI Developer	Tableau Server	Redshift mart tables; stakeholder wireframe sign-off	Tableau workbook with certified data source and row-level security filters	Dashboard load time ≤8 s on standard extract; certified data source reuse rate ≥60% across dashboards	N	Y
4.2	Obtain stakeholder sign-off on dashboard accuracy	Data Product Manager	Confluence	Published Tableau dashboard (UAT environment); reconciliation against source system report	Signed acceptance document; dashboard promoted to production Tableau Server	UAT cycle ≤5 business days; zero P1 metric errors post-production release	Y	N
4.3	Publish self-service data products to analytics portal	Data Product Manager	AWS S3 / Redshift	Certified Tableau dashboards; approved dbt mart definitions	Published data product with SLA, owner, lineage, and Collibra catalog linkage	Data product NPS ≥7/10 in quarterly survey; product adoption ≥80% of target business unit within 30 days of launch	N	N
Phase 5 5.1	Monitor pipeline health and SLA compliance	DataOps Engineer	AWS CloudWatch	Airflow DAG run metrics; Glue job logs; Redshift query performance insights	Real-time pipeline health dashboard; automated SLA breach alerts to PagerDuty	Pipeline SLA breach rate ≤0.5% per month; mean time to detect (MTTD) ≤5 min	Y	Y
5.2	Manage Redshift cluster capacity and cost governance	Cloud FinOps Analyst	AWS Cost Explorer	Monthly Redshift usage report; query advisor recommendations	Rightsizing recommendation; reserved instance purchase plan; cluster scaling action	Redshift cost per TB processed ≤$25; month-over-month cost variance ≤10%; idle cluster time ≤5%	N	Y
5.3	Execute platform incident response and root-cause analysis	DataOps Engineer	PagerDuty	SLA breach alert from CloudWatch; user-reported data outage	Incident resolution; post-mortem report in Confluence; pipeline fix deployed via CI/CD	MTTR for P1 data outages ≤2 hours; post-mortem completed within 48 hours; recurrence rate ≤10% for resolved incidents	N	Y

📋

Process Attributes

Identification

Process IDCS-10

L1 DomainCorporate Support Functions

L2 ProcessInformation Technology

L3 NameData Engineering & Analytics Platform

L4 Steps17 across 5 phases

Decision Gates6 (all with iteration loops)

Exceptions13 documented

Swim Lanes (Roles)

Data Product Manager

Data Engineer

Data Quality Engineer

Data Governance Analyst

Analytics Engineer

BI Developer

DataOps Engineer

Cloud FinOps Analyst

Systems & Tools

ConfluenceAWS Glue Data CatalogAWS GlueAWS S3Monte CarloCollibra Data Intelligence CloudAmazon Maciedbt CloudAmazon RedshiftTableau ServerAWS S3 / RedshiftAWS CloudWatchAWS Cost ExplorerPagerDuty

Key Performance Indicators

Capture data source requirements from businessRequirements captured within 5 business days of request; ≥90% completeness score

Assess source API availability and schemaSchema discovery completed within 2 days; 100% of fields documented

Configure AWS Glue ETL connectorsConnector build time ≤3 days; incremental load latency ≤15 min for streaming sources

Ingest raw data to S3 landing zonePipeline success rate ≥99.5% per month; P99 ingestion latency ≤30 min

Execute automated data quality checksDQ check coverage ≥95% of all active datasets; anomaly detection P95 latency ≤10 min

Quarantine and triage DQ failuresP1 DQ incidents triaged within 1 hour; resolution SLA ≤4 hours for operational data domains

Register and classify dataset in data catalog100% of production datasets catalogued within 1 sprint of go-live; lineage coverage ≥80% of governed domains

Apply PII masking and access controlsPII detection coverage ≥99% of passenger-facing domains; zero unmasked PII in non-production environments

Airline-Specific Risks & Pain Points

Airline ops systems (Amadeus Altéa, Jeppesen) use proprietary APIs with limited documentation, causing specification rework

PSS and crew management systems use batch SFTP exports rather than real-time APIs, creating up to 24-hour data latency for operational dashboards

GDS feed formats (Sabre, Travelport) change without advance notice, breaking Glue jobs and requiring emergency patching during revenue-critical periods

Amadeus Altéa PNR change events generate burst loads during irregular operations (IROPs), overwhelming Kinesis shards and causing data loss if auto-scaling is not pre-configured

Flight operations feeds contain mission-critical fields (flight status, delay codes) where a single null cascades into incorrect OTP reporting to DOT, creating regulatory exposure

Crew system data gaps post-irregular ops (mirrors Dec 2022 SkySolver failure pattern) create silent DQ failures where crew assignment data appears complete but contains stale records

Inputs / Outputs

Primary InputBusiness analytics request or new system onboarding

Primary OutputIncident resolution; post-mortem report in Confluence; pipeline fix deployed via CI/CD

←PreviousCS-09 · Cybersecurity & Threat Management NextCS-11 · Strategic Sourcing & Vendor Selection→