Mubeen
Mubeen

Reputation: 98

How to find number of partitions in a DataFrame using Python in spark And how to create Partitions in a DataFrame with Python in spark

I have a DataFrame named df and I want to find out that can I create partitions while we read data into a DataFrame

AND

I also want to understand how can we find the number of partitions in a DataFrame. I have searched multiple answers which show the following df.rdd.getNumPartitions() but by default it is returning 1 only.

I tried coalese() and repartition to change the number of partitons.

Please help

Upvotes: 1

Views: 636

Answers (1)

Mohana B C
Mohana B C

Reputation: 5487

While reading any file as a dataframe using DataFrameReader, we don't have option to specify number of partitions. Here You can read about default number of partitions which will get created while reading, or change the partitions while reading file as RDD.

Using repartition() you can increase/decrease number of partitions but using coalesce you can just decrease number of partitions.

You might have missed reassigning repartitioned dataframe to variable back again that's why previous partition is getting displayed.

df = spark.read.csv('file.csv')
df = df.repartition(10) # reassign to any variable. dataframes are immutable
# Now check number of partitions
df.rdd.getNumPartitions()

Upvotes: 3

Related Questions