#data-engineering
Read more stories on Hashnode
Articles with this tag
SQL stored procedures and functions are versatile tools in SQL that allow users to develop reusable and optimized code for handling intricate database...
Slowly Changing Dimensions (SCDs) are a vital concept in data warehousing, particularly in managing data that changes over time. As the entities...
What is Databricks? Databricks, developed by the creators of Spark, offers a comprehensive solution for all data needs. From storage to insights via...
A Delta Lake is not different from a Parquet file with a robust versioning system. It utilizes transaction logs stored in JSON files to maintain a...
1. Broadcast Join When dealing with the challenge of joining a larger DataFrame with a smaller one in PySpark, the conventional Spark join operation...
Spark's Execution Plan is a series of operations carried out to translate SQL statements into a set of logical and physical operations. In short, it...