1
Easy
0 / 2 points
Predstav si, že máš tabuľku `customers` v transakčnom systéme. Zákazník zmení email adresu. Ako by si to riešil v klasickej relačnej DB vs v Data Vault? Aký je hlavný rozdiel?
2
Easy
0 / 2 points
Máš dbt projekt s 50+ modelmi a build trvá 2 hodiny. Business chce aktualizovať len jeden report každú hodinu. Ako by si to optimalizoval?
3
Medium
0 / 3 points
Potrebuješ integrovať dáta z API, ktoré má rate limit 100 requestov/minútu a máš 50 000 produktov na update. Ako by si to navrhol v dbt/Python pipeline?
4
Medium
0 / 3 points
V Data Vault máš 3 satellites na produkt z rôznych zdrojov (ERP, E-shop, CRM). Business chce jeden unified view. Ako vytvoríš "Golden Record" a ako rozhodneš, ktorému zdroju dôverovať pri konfliktoch?
5
Medium
0 / 3 points
dbt test zlyháva na production, ale na dev prostredí prechádza. Ako by si debugoval tento problém a aké sú možné príčiny?
6
Hard
0 / 4 points
Business potrebuje report "Customer 360" - všetky info o zákazníkovi (profile, orders, payments, support tickets) z posledných 3 rokov. V Data Vault to vyžaduje 8+ JOINov a query trvá 5 minút. Ako by si to optimalizoval na < 10 sekúnd?
7
Hard
0 / 4 points
Zistíš, že za posledných 6 mesiacov sa do Data Vault nahrávali duplicitné záznamy kvôli bug-u v ETL. Teraz máš v hub_customer duplikáty (rovnaký customer_id s rôznymi hash). Ako by si to vyčistil bez straty dát a histórie?
8
Hard
0 / 4 points
Potrebuješ implementovať GDPR "right to be forgotten" v Data Vault architektúre. Zákazník žiada vymazanie všetkých osobných údajov. Ako to urobíš bez porušenia Data Vault principov (never delete)?
9
Easy
0 / 2 points
What is the difference between batch processing and stream processing in data engineering?
10
Easy
0 / 2 points
Explain the concept of data partitioning in distributed systems. Why is it important?
11
Easy
0 / 2 points
What is the purpose of a data warehouse vs a data lake?
12
Medium
0 / 3 points
Explain slowly changing dimensions (SCD) in data warehousing. What are the different types?
13
Medium
0 / 3 points
How do you handle late-arriving data in a streaming pipeline?
14
Medium
0 / 3 points
What is data lineage and why is it important in data engineering?
15
Critical
0 / 5 points
Design a real-time data pipeline that ingests 100,000 events/second from IoT devices, performs aggregations, and serves results with sub-second latency. Include architecture, technologies, and scaling strategies.
16
Critical
0 / 5 points
You need to migrate a legacy 50TB data warehouse to the cloud while maintaining 24/7 availability. Design the migration strategy including data validation, rollback plan, and zero-downtime approach.
17
Critical
0 / 5 points
Design a data quality framework for a large-scale data platform. Include anomaly detection, validation rules, automated remediation, and SLA monitoring.
18
Critical
0 / 5 points
Implement a CDC (Change Data Capture) solution for a production database with millions of transactions daily. How do you handle initial snapshot, incremental changes, schema evolution, and ensure exactly-once delivery?
📊 Total Score
0%
🟢 Easy
0/10
🟡 Medium
0/18
🔴 Hard
0/12