What is k-Fold Cross-Validation?
What is the difference between supervised and unsupervised learning?
What is reinforcement learning? What is it used for?
Definition: Learning paradigm where agent learns to make decisions by interacting with environment and receiving rewards/penalties
Key components: Agent, environment, state, action, reward signal - agent learns optimal policy to maximize cumulative reward
Game AI: Playing chess, Go, video games (AlphaGo, OpenAI Dota 2) - learning winning strategies through self-play
Robotics & Control: Robot navigation, autonomous vehicles, resource management, optimizing complex sequential decisions
Business applications: Recommendation systems, trading strategies, resource allocation, dynamic pricing
What is data/class imbalance? What consequences does it have? How to treat those?
Definition: When classes in dataset are not equally represented (e.g., 95% negative, 5% positive cases)
Consequences: Model biased toward majority class, poor performance on minority class, misleading accuracy metrics
Resampling: Oversampling minority class (SMOTE), undersampling majority class, or combination of both
Algorithm techniques: Class weights, cost-sensitive learning, ensemble methods (balanced random forest)
Better metrics: Use precision, recall, F1-score, ROC-AUC instead of accuracy for imbalanced datasets
What is feature engineering?
Definition: Process of creating, transforming, and selecting features from raw data to improve model performance
Techniques: Creating new features (interactions, polynomials), transforming existing ones (scaling, encoding), extracting from text/dates
Importance: Often more impactful than algorithm choice, requires domain knowledge, can significantly boost model accuracy
Examples: One-hot encoding categorical variables, extracting day/month from dates, creating ratio features, binning continuous variables
What are transformer-based models?
Architecture: Neural network architecture based on self-attention mechanism, processes sequences without recurrence (unlike RNN/LSTM)
Self-attention: Allows model to weigh importance of different words in sequence, capturing long-range dependencies efficiently
Examples: BERT, GPT (GPT-3, GPT-4), T5, transformers power most modern NLP systems
Advantages: Parallel processing (faster training), better long-range context, transfer learning capabilities
Applications: Translation, text generation, question answering, summarization, increasingly used beyond NLP (vision transformers)
What is RAG (retrieval-augmented generation) and how does it relate to your expertise?
Definition: Technique combining retrieval systems with generative models - retrieves relevant documents then generates response based on retrieved context
Components: Vector database/search engine for retrieval, embeddings for similarity search, LLM for generation
Advantages: Reduces hallucinations, grounds responses in real data, can access up-to-date information beyond training cutoff
Use cases: Enterprise Q&A systems, chatbots with knowledge bases, document analysis, customer support automation
Technical implementation: Involves chunking documents, creating embeddings, storing in vector DB (Pinecone, Weaviate), semantic search + prompt engineering
Personal expertise: Experience building RAG systems, working with vector databases, optimizing retrieval quality, prompt engineering for better generation
How is fine-tuning different from regular training?
How do the fine-tuning options and limitations differ between closed-source models (e.g., OpenAI GPT-4) and open-source models that you can self-host?
Closed-source (OpenAI): Limited to API-based fine-tuning, no access to model weights, restricted customization through provided interfaces only
Open-source: Full control over model architecture, can modify any layer, complete fine-tuning flexibility, access to all weights
Cost considerations: Closed-source has per-token costs, open-source requires infrastructure investment but no usage fees
Data privacy: Closed-source sends data to external servers, open-source allows complete on-premise deployment for sensitive data
Technical expertise: Closed-source easier to use with less ML knowledge required, open-source needs deeper understanding of training, hardware, optimization
Examples: Open-source options include Llama 2, Mistral, Falcon - can use LoRA, QLoRA, full fine-tuning with complete control
What parameter-efficient fine-tuning techniques exist for open models (e.g., LoRA, QLoRA), and how do they work conceptually?
LoRA (Low-Rank Adaptation): Freezes pre-trained weights, adds small trainable rank decomposition matrices to layers, drastically reduces trainable parameters
How LoRA works: Instead of updating weight matrix W, trains two smaller matrices A and B where W_new = W + AB, A and B much smaller
QLoRA: Combines LoRA with quantization, loads model in 4-bit precision, trains LoRA adapters in higher precision, enables fine-tuning on consumer GPUs
Memory benefits: Can fine-tune 65B parameter models on single GPU, reduces memory from hundreds of GB to tens of GB
Other techniques: Prefix tuning (add trainable tokens), adapter layers (insert small modules), prompt tuning (optimize soft prompts)
Practical benefits: Faster training, lower costs, multiple task-specific adapters can share base model, easier deployment and version control
List vs Tuple in Python? When do you use what?
List: Mutable (can modify after creation), uses square brackets [], slower, for collections that change
Tuple: Immutable (cannot modify after creation), uses parentheses (), faster, less memory, for fixed collections
Use list when: Data will be modified (append, remove, sort), working with dynamic collections, need list methods
Use tuple when: Data should not change (coordinates, RGB values), dictionary keys, function return multiple values, better performance needed
What are decorators in Python?
Definition: Functions that modify behavior of other functions/methods, using @decorator_name syntax above function definition
How they work: Take function as input, add functionality, return modified function - implements wrapper pattern
Common uses: Logging, timing functions, authentication/authorization, caching (memoization), validation
Built-in examples: @property, @staticmethod, @classmethod, @lru_cache for performance optimization
Practical benefit: Clean code separation of concerns, reusable cross-cutting functionality, don't repeat yourself (DRY)
What is the purpose of the .groupby() method in the Pandas library?
Explain the difference between a generator and a normal function that returns a list. Why are generators (which use the yield keyword) particularly beneficial when processing very large, potentially memory-intensive datasets in data science?
Design a complete MLOps pipeline for a production-grade recommendation system handling 1M+ daily predictions. Include model versioning, A/B testing, monitoring, and automated retraining.
Infrastructure: Use MLflow/KubeFlow for model registry and versioning, Docker containers for reproducibility, Kubernetes for orchestration, separate training/serving environments
A/B Testing: Implement feature flags, shadow mode deployment, canary releases with traffic splitting (90/10 → 50/50), statistical significance testing for metrics
Monitoring: Track model metrics (accuracy, latency, throughput), data drift detection, concept drift monitoring, alerting systems (Prometheus/Grafana), logging predictions for debugging
Automated Retraining: Schedule periodic retraining, trigger-based retraining on drift detection, validate new models against holdout set, automated rollback on performance degradation
Data Pipeline: Feature store for consistency, data validation (Great Expectations), preprocessing pipelines, efficient data loading (Parquet/TFRecord), versioned datasets
You're building a fraud detection system with 99.5% legitimate transactions. Explain how you would handle extreme class imbalance, choose appropriate metrics, and ensure the model doesn't just predict "legitimate" for everything.
Sampling Techniques: SMOTE for oversampling minority class, random undersampling majority class, combined approach, stratified validation splits to maintain class distribution
Metrics: Use Precision-Recall curve and F1-score (NOT accuracy), focus on recall for fraud detection, define acceptable false positive rate, use ROC-AUC with caution on imbalanced data
Cost-Sensitive Learning: Assign higher misclassification cost to false negatives (missed fraud), use class_weight parameter in sklearn, focus on business impact not just accuracy
Algorithm Selection: Tree-based models (XGBoost, Random Forest) handle imbalance better, anomaly detection approaches (Isolation Forest, One-Class SVM), ensemble methods
Validation Strategy: Time-based validation (not random split), ensure fraud patterns in test set, monitor precision-recall at different thresholds, validate on recent data continuously
Design a distributed training strategy for a large-scale deep learning model (BERT-scale) across multiple GPUs and nodes. Include parallelism strategies, gradient synchronization, and optimization techniques.
Data Parallelism: Replicate model on each GPU, split batch across GPUs, synchronize gradients (AllReduce), use DistributedDataParallel in PyTorch or Horovod
Model Parallelism: Split model layers across GPUs when model too large for single GPU, pipeline parallelism for sequential models, tensor parallelism for transformer layers
Gradient Optimization: Gradient accumulation for effective large batch sizes, mixed precision training (FP16/BF16) with automatic scaling, gradient checkpointing to save memory
Communication: Use NCCL for multi-GPU communication, optimize network topology, gradient compression, overlap computation with communication
Framework: DeepSpeed/ZeRO for memory optimization, PyTorch FSDP (Fully Sharded Data Parallel), efficient data loading with prefetching, monitor GPU utilization and bottlenecks
Explain feature engineering strategies for time-series data in a production forecasting system. How do you handle seasonality, trends, and create lag features while avoiding data leakage?
Temporal Features: Extract hour/day/month/year, day of week, weekend indicator, holiday flags, time since last event, cyclical encoding for periodic features (sin/cos transformation)
Lag Features: Create lagged values (t-1, t-7, t-30), rolling statistics (mean, std, min, max over windows), exponential moving averages, ensure no future data in features
Seasonality: Decompose using STL/seasonal_decompose, Fourier features for periodic patterns, seasonal differencing, multiple seasonality handling (daily + weekly + yearly)
Avoiding Data Leakage: Use TimeSeriesSplit for cross-validation, never shuffle time-series data, compute rolling features using only past data, separate feature engineering for train/test
Advanced Features: Change point detection, autocorrelation features, interaction between temporal and external features, domain-specific features (weather for retail, events for traffic)
📊 Total Score
0%
Start checking answers!
📄 Generate Report
🔄 Reset All
📋 Interview Report - ML/Data Science Engineer
📥 Stiahnuť PDF