Ajinkya
Ajinkya

Reputation: 81

how to create dynamic dataframe name in pyspark

how to create dynamic dataframe name in pyspark here I am not able to create new dataframe using below code it will give me only last dataframe name, I need All dataframe name

for prime2 in pdf2:
    ol2 =  Bucket_path + prime2['S3_File_with_Path']
    t = 1
    sd = {}  
    testR = "df" + str(t)
    print("testR",testR)
    sd[testR] = spark.read.format("parquet").load(ol2).cache() 
    t = t + 1 

Upvotes: 0

Views: 3778

Answers (1)

Rayan Ral
Rayan Ral

Reputation: 1859

Seems like you're creating dict inside the loop, so getting a dict with only one (last) entry. Try changing code to something like this:

sd = {}  
for prime2 in pdf2:
    ol2 =  Bucket_path + prime2['S3_File_with_Path']
    t = 1
    testR = "df" + str(t)
    print("testR",testR)
    df = spark.read.format("parquet").load(ol2).cache() 
    sd[testR] = df
    t = t + 1 

# sd dict is available here, all the dataframes are inside
print(len(sd))

Upvotes: 2

Related Questions