v1.0
Home Corporate Support Functions ⭐ GitHub
📊

Process Flow Diagram (BPMN)

CS-10 BPMN diagram
📋

L4 Process Steps

StepStep NameRole / Swim LaneSystem InputOutputKPIDec?Exc?
Phase 1
1.1
Capture data source requirements from business Data Product Manager Confluence Business analytics request or new system onboarding Data source specification document with SLA and refresh frequency Requirements captured within 5 business days of request; ≥90% completeness score N N
1.2 Assess source API availability and schema Data Engineer AWS Glue Data Catalog Data source specification document Schema discovery report; connector feasibility assessment Schema discovery completed within 2 days; 100% of fields documented Y Y
1.3 Configure AWS Glue ETL connectors Data Engineer AWS Glue Schema discovery report; source credentials in AWS Secrets Manager Parameterised Glue job with incremental load logic Connector build time ≤3 days; incremental load latency ≤15 min for streaming sources N Y
1.4 Ingest raw data to S3 landing zone Data Engineer AWS S3 Source system data via Glue job or Kinesis Data Streams Raw partitioned Parquet files in S3 landing zone with ingestion timestamp Pipeline success rate ≥99.5% per month; P99 ingestion latency ≤30 min N Y
Phase 2
2.1
Execute automated data quality checks Data Quality Engineer Monte Carlo Raw data files in S3 landing zone DQ scorecard with row counts, null rates, freshness, schema drift alerts DQ check coverage ≥95% of all active datasets; anomaly detection P95 latency ≤10 min Y Y
2.2 Quarantine and triage DQ failures Data Quality Engineer Monte Carlo DQ failure alert from Monte Carlo Quarantine record in S3 rejected zone; Jira ticket auto-created for source team P1 DQ incidents triaged within 1 hour; resolution SLA ≤4 hours for operational data domains N Y
2.3 Register and classify dataset in data catalog Data Governance Analyst Collibra Data Intelligence Cloud Validated dataset; DQ scorecard Catalog entry with business glossary linkage, data owner, lineage graph, and retention policy 100% of production datasets catalogued within 1 sprint of go-live; lineage coverage ≥80% of governed domains Y N
2.4 Apply PII masking and access controls Data Governance Analyst Amazon Macie Dataset classification from Collibra; GDPR/CCPA data subject inventory Masked/tokenised dataset in governed S3 zone; IAM policy attached to dataset PII detection coverage ≥99% of passenger-facing domains; zero unmasked PII in non-production environments N Y
Phase 3
3.1
Develop dbt transformation models Analytics Engineer dbt Cloud Governed raw data in S3; dimensional modelling spec from data product owner dbt models (staging → intermediate → mart layers); compiled SQL lineage DAG dbt model test pass rate ≥98%; model execution time within agreed SLA per domain (e.g. revenue mart ≤20 min) N Y
3.2 Run dbt model tests and schema validation Analytics Engineer dbt Cloud dbt model compilation output Test results report; row count reconciliation against source Zero undetected primary-key duplicates in mart layer; row count variance ≤0.1% vs source Y Y
3.3 Load transformed data to Redshift data warehouse Data Engineer Amazon Redshift Validated dbt mart tables in S3; COPY manifest Populated Redshift dimensional schema with distribution and sort keys optimised Load job SLA: ≤60 min for daily full refresh; query P90 response time ≤5 s on standard analyst workloads N Y
Phase 4
4.1
Design and build Tableau analytics dashboards BI Developer Tableau Server Redshift mart tables; stakeholder wireframe sign-off Tableau workbook with certified data source and row-level security filters Dashboard load time ≤8 s on standard extract; certified data source reuse rate ≥60% across dashboards N Y
4.2 Obtain stakeholder sign-off on dashboard accuracy Data Product Manager Confluence Published Tableau dashboard (UAT environment); reconciliation against source system report Signed acceptance document; dashboard promoted to production Tableau Server UAT cycle ≤5 business days; zero P1 metric errors post-production release Y N
4.3 Publish self-service data products to analytics portal Data Product Manager AWS S3 / Redshift Certified Tableau dashboards; approved dbt mart definitions Published data product with SLA, owner, lineage, and Collibra catalog linkage Data product NPS ≥7/10 in quarterly survey; product adoption ≥80% of target business unit within 30 days of launch N N
Phase 5
5.1
Monitor pipeline health and SLA compliance DataOps Engineer AWS CloudWatch Airflow DAG run metrics; Glue job logs; Redshift query performance insights Real-time pipeline health dashboard; automated SLA breach alerts to PagerDuty Pipeline SLA breach rate ≤0.5% per month; mean time to detect (MTTD) ≤5 min Y Y
5.2 Manage Redshift cluster capacity and cost governance Cloud FinOps Analyst AWS Cost Explorer Monthly Redshift usage report; query advisor recommendations Rightsizing recommendation; reserved instance purchase plan; cluster scaling action Redshift cost per TB processed ≤$25; month-over-month cost variance ≤10%; idle cluster time ≤5% N Y
5.3 Execute platform incident response and root-cause analysis DataOps Engineer PagerDuty SLA breach alert from CloudWatch; user-reported data outage Incident resolution; post-mortem report in Confluence; pipeline fix deployed via CI/CD MTTR for P1 data outages ≤2 hours; post-mortem completed within 48 hours; recurrence rate ≤10% for resolved incidents N Y
📋

Process Attributes

Identification

Process IDCS-10
L1 DomainCorporate Support Functions
L2 ProcessInformation Technology
L3 NameData Engineering & Analytics Platform
L4 Steps17 across 5 phases
Decision Gates6 (all with iteration loops)
Exceptions13 documented

Swim Lanes (Roles)

Data Product Manager
Data Engineer
Data Quality Engineer
Data Governance Analyst
Analytics Engineer
BI Developer
DataOps Engineer
Cloud FinOps Analyst

Systems & Tools

ConfluenceAWS Glue Data CatalogAWS GlueAWS S3Monte CarloCollibra Data Intelligence CloudAmazon Maciedbt CloudAmazon RedshiftTableau ServerAWS S3 / RedshiftAWS CloudWatchAWS Cost ExplorerPagerDuty

Key Performance Indicators

Capture data source requirements from businessRequirements captured within 5 business days of request; ≥90% completeness score
Assess source API availability and schemaSchema discovery completed within 2 days; 100% of fields documented
Configure AWS Glue ETL connectorsConnector build time ≤3 days; incremental load latency ≤15 min for streaming sources
Ingest raw data to S3 landing zonePipeline success rate ≥99.5% per month; P99 ingestion latency ≤30 min
Execute automated data quality checksDQ check coverage ≥95% of all active datasets; anomaly detection P95 latency ≤10 min
Quarantine and triage DQ failuresP1 DQ incidents triaged within 1 hour; resolution SLA ≤4 hours for operational data domains
Register and classify dataset in data catalog100% of production datasets catalogued within 1 sprint of go-live; lineage coverage ≥80% of governed domains
Apply PII masking and access controlsPII detection coverage ≥99% of passenger-facing domains; zero unmasked PII in non-production environments

Airline-Specific Risks & Pain Points

Airline ops systems (Amadeus Altéa, Jeppesen) use proprietary APIs with limited documentation, causing specification rework
PSS and crew management systems use batch SFTP exports rather than real-time APIs, creating up to 24-hour data latency for operational dashboards
GDS feed formats (Sabre, Travelport) change without advance notice, breaking Glue jobs and requiring emergency patching during revenue-critical periods
Amadeus Altéa PNR change events generate burst loads during irregular operations (IROPs), overwhelming Kinesis shards and causing data loss if auto-scaling is not pre-configured
Flight operations feeds contain mission-critical fields (flight status, delay codes) where a single null cascades into incorrect OTP reporting to DOT, creating regulatory exposure
Crew system data gaps post-irregular ops (mirrors Dec 2022 SkySolver failure pattern) create silent DQ failures where crew assignment data appears complete but contains stale records

Inputs / Outputs

Primary InputBusiness analytics request or new system onboarding
Primary OutputIncident resolution; post-mortem report in Confluence; pipeline fix deployed via CI/CD
PreviousCS-09 · Cybersecurity & Threat ManagementNextCS-11 · Strategic Sourcing & Vendor Selection