5.R&D at AppNava: Sequential and Deep Learning

Between July 7 and September 5, 2025, I joined AppNava in Ankara as a Data Science Intern. AppNava is a fast-growing AI company providing prediction and optimization tools for mobile games, focusing on player-level metrics like Lifetime Value (LTV), churn, and Return on Ad Spend (ROAS).

The goal of my internship was to build and evaluate machine learning pipelines for LTV prediction, using real-world game data at production scale. The challenge: 1 million+ rows, 479 features, 98% of users with zero LTV, and heavy-tailed distributions.

This post covers the technical workflow in depth:

Data exploration and preprocessing
Feature selection and transformations
Two-stage LTV prediction models
Comprehensive evaluations
Interpretations of results
Parallel R&D efforts at AppNava

1. Data Exploration

The dataset contained 1,032,565 rows and 479 columns of raw player telemetry from a real mobile game.

Key Findings

Missingness: 13.7% of entries had missing values, but the target column (ltv) had none.
Redundancy: 29 columns were entirely empty; 237 were duplicates of others.
Imbalance: 97.89% of users had 0 LTV. 2.11% had non-zero LTV.
Distribution: Skewness = 85.28, Kurtosis = 14,060.23. Confirmed via histograms, Q-Q plots, and boxplots (all showed strong right skew + extreme outliers).

Log: reduced skewness but preserved long tail.
Box-Cox: reduced skewness to -0.009 (best normalization).
Yeo-Johnson: skewness = 0.109, effective but slightly worse.

2. Data Preprocessing

2.1 Dimensionality Reduction

Removed 29 empty columns, 237 duplicates, and 3 constant-value columns.

Final feature set reduced to 252 columns.

2.2 Missing Value Handling

Missing values replaced with 0 (interpreted as “no event”).

Justified because absence of action = meaningful behavioral signal.

2.3 Feature Selection

Hybrid Lasso + Random Forest: kept linear signals + non-linear interactions.

Pure Random Forest importance: selected 124 features, outperformed hybrid.

2.4 Correlation Filtering

Tested removing features with correlation > 0.95.

Removal worsened performance → reverted (high correlation features sometimes still added value)

3. Modeling Approaches

The two-stage pipeline:

Classification – predict if a player will spend.

Regression – predict how much, conditional on spending.

I implemented and compared three pipelines:

3.1 Naive Heuristic Baseline

Classifier: always predicts “non-spender.”

Regressor: assigns mean spender LTV (9.35).

Results:

Classification: ROC AUC = 0.50, F1 = 0.00.
Regression: MAE = 9.37, RMSE = 10.11, R² = -4.51.

Usefulness: only as a lower-bound baseline.

3.2 Logistic Regression + Tweedie Regressor

Classifier: Logistic Regression with class weighting.
Oversampling (SMOTE, ADASYN) tested → poor on sparse, large-scale data.
Class weighting ({0:1, 1:10} best) worked best.
Metrics: ROC AUC = 0.9472, F1 = 0.5425, AP = 0.5225.
Regressor: Tweedie distribution (Compound Poisson-Gamma).
Naturally models zero-inflated outcomes.
Results:

Spearman = 0.40, MedAE = 4.18.
RMSE = 30.01 (high variance in large outliers).
R² = -0.14 (worse than mean baseline).

Worked decently for low/mid spenders, failed for high-value users.

3.3 LightGBM Classifier + LightGBM Regressor

Classifier: Class weighting + hyperparameter tuning via LightGBMTunerCV.
Cross-validation: StratifiedKFold (5 splits).
Metrics: ROC AUC = 0.9626, F1 = 0.5931, AP = 0.6029.
Regressor: Objective = Quantile Loss (α = 0.6) → robust to long tails.
Evaluation metrics:

MAE = 8.72
RMSE = 24.06
MedAE = 4.34
Spearman = 0.44
R² = 0.18

Outperformed Tweedie in high-value LTV users (critical business impact).

4. Comprehensive Evaluation

4.1 Classification Comparison

Model	F1	ROC AUC	Log Loss	MCC	AP
Naive	0.00	0.50	0.1015	0.00	0.0209
Logistic Regression	0.54	0.9472	0.0750	0.539	0.5225
LightGBM	0.59	0.9626	0.0835	0.586	0.6029

Takeaway: Both Logistic and LightGBM worked well, but LightGBM had the best balance of recall and precision on minority spenders.

4.2 Regression Comparison

Model	MAE	RMSE	MedAE	Spearman	R²
Naive	9.37	10.11	9.35	0.00	-4.51
Tweedie	8.78	30.01	4.18	0.40	-0.14
LightGBM	8.72	24.06	4.34	0.44	0.18

Takeaway: Tweedie slightly better for MedAE (median error, robust to outliers), but LightGBM dominates across other metrics and uniquely achieves positive R².

4.3 Segment-Based Analysis

Players grouped into 3 LTV segments:
- 0–5 (low spenders)
- 5–15 (mid spenders)
- 15+ (whales)
Tweedie Results:
- Low: MedAE = 3.49, Spearman = 0.16.
- Mid: MedAE = 5.26, Spearman = 0.30.
- High: MedAE = 22.82, Spearman = 0.08 → almost random.
LightGBM Results:
- Low: MedAE = 2.64, Spearman = 0.13.
- Mid: MedAE = 6.06, Spearman = 0.24.
- High: MedAE = 15.31, Spearman = 0.34, Pearson = 0.38.
Interpretation:
- Tweedie = stable for low spenders.
- LightGBM = strong for high-value whales — business-critical since whales drive revenue.

5. R&D at AppNava: Sequential and Deep Learning Approaches

5.1 LSTMs & GRUs

Used on session-level sequences (logins, purchases, ad views).
Captured time-to-first-purchase dynamics and churn signals.
Outperformed static models in early-stage LTV prediction (first 24–48h).

5.2 Transformers for User Event Sequences

Event streams were embedded into contextual vectors (actions, timestamps, metadata).
Transformer’s self-attention captured long-range dependencies — e.g., a purchase on day 3 influencing LTV weeks later.
Being tested against LSTM baselines; showed better stability on sparse data.

5.3 Temporal Convolutional Networks (TCNs)

Explored for scalability → dilated convolutions handled long histories in parallel.
Practical for millions of daily active users where RNNs became bottlenecks.

5.4 Hybrid Models

Ongoing work combined:

Boosting (LightGBM/XGBoost) on static tabular features.
Transformers/LSTMs on raw sequential events.
Ensembles provided better interpretability + predictive accuracy.

6. Key Learnings

Two-stage modeling is mandatory for zero-inflated outcomes.
Gradient boosting (LightGBM) is robust against high-dimensional, skewed, and imbalanced data.
Simple techniques (Box-Cox, class weighting) often outperform more complex oversampling methods at scale.
Segment-specific evaluation is critical: Tweedie ≈ low-LTV stability, LightGBM ≈ high-LTV accuracy.
Sequential deep models (LSTMs, Transformers, TCNs) are actively researched and already promising for early prediction of whales.

7. Reflection

This internship gave me hands-on experience in end-to-end ML pipelines, from BigQuery preprocessing to model design, evaluation, and interpretation. I also witnessed how cutting-edge R&D in sequence modeling is shaping the future of gaming analytics.

The key takeaway: LTV prediction is not just a technical challenge but a strategic one. Identifying whales early can change how studios allocate marketing budgets, personalize gameplay, and design monetization strategies.

At AppNava, my work on boosting-based models complemented ongoing R&D on deep learning architectures, making it clear that the future lies in hybrid approaches that combine tabular + sequential signals.

Predicting Player Lifetime Value in Mobile Games: A Technical Deep Dive from My Internship at AppNava

Table of Contents

1. Data Exploration

Key Findings

2. Data Preprocessing

2.1 Dimensionality Reduction

2.2 Missing Value Handling

2.3 Feature Selection

2.4 Correlation Filtering

3. Modeling Approaches

3.1 Naive Heuristic Baseline

3.2 Logistic Regression + Tweedie Regressor

3.3 LightGBM Classifier + LightGBM Regressor

4. Comprehensive Evaluation

4.1 Classification Comparison

4.2 Regression Comparison

4.3 Segment-Based Analysis

5. R&D at AppNava: Sequential and Deep Learning Approaches

5.1 LSTMs & GRUs

5.2 Transformers for User Event Sequences

5.3 Temporal Convolutional Networks (TCNs)

5.4 Hybrid Models

6. Key Learnings

7. Reflection

Never miss a Post Subscribe Now

Related Posts

2021 Global & Mobile Game Industry Report with AppNava

Success Indicator in Mobile Games: Retention Metrics

6 Things to Keep in Mind for Gaming Startup Founder

Home

Solutions

Resources

Company

© 2025 - AppNava