Vitthal Mirji

07 Dec 2020

Master Spark architecture: Staff-principal study and interview prep in Apache Spark

9,445 words, ~37 min read

Build staff-principal Spark depth from fundamentals to internals. Learn scheduler, memory, shuffle, AQE, Spark SQL, Structured Streaming, and Databricks Delta with troubleshooting runbooks and trade-off-driven decisions.

21 May 2020

Goodbye, WordPress: hello Hugo + nginx with fast builds and sane deploys today

3,217 words, ~12 min read

Deep-dive guide to migrating from WordPress to Hugo. Fix 40-second page loads and 2.5MB transfers with static HTML, GitHub Actions CI/CD, and nginx deployment. Includes export strategy, theme customization, and a secure deployment setup.

01 Jul 2019

Is Hadoop dead? Cloudera-Hortonworks, MapR layoffs, and Hadoop 3.0 reality now

2,813 words, ~11 min read

Deep-dive into Hadoop's state in 2019: Cloudera+Hortonworks merger, MapR struggles, and what Hadoop 3.0 delivers (GPU scheduling, Docker, Hive ACID). Learn why AWS/GCP/Azure dominate but Hadoop is evolving into hybrid cloud with Spark.

04 Nov 2017

Data lakes: Hive is not an RDBMS, HBase 141x faster, Spark's role in practice

5,205 words, ~20 min read

Deep-dive performance comparison of Hive, HBase, and Spark on 9.6M NYC taxi records. Learn why Hive ACID falls short, when HBase is 141x faster for lookups, and what Spark actually optimizes for. Includes benchmarks and usage guidance.

01 Jan 1

712 words, ~2 min read

# Ingestion to Insight: AI-Orchestrated Data Platform ## Introduction This captures the vision for Future of Data Plat...

Recent Posts

Master Spark architecture: Staff-principal study and interview prep in Apache Spark

Goodbye, WordPress: hello Hugo + nginx with fast builds and sane deploys today

Is Hadoop dead? Cloudera-Hortonworks, MapR layoffs, and Hadoop 3.0 reality now

Data lakes: Hive is not an RDBMS, HBase 141x faster, Spark's role in practice