user07
user07

Reputation: 670

Create a pyspark dataframe from dict_values

I am trying to generate a pyspark dataframe out of dict_values. I can achieve the same using pandas concate function. and dictionary consist of key as year and value as pyspark dataframe.

Here is my code which i am using i have one alternative to union all all the dataframe which i believe not the better way to achieve it.

dict_ym = {}
for yearmonth in keys:    
    key_name = 'df_'+str(yearmonth)
    dict_ym[key_name]= df
    # Add a new column to dataframe
    # Perform some more transformation

dict_ym 

# Now above dict has key as yearmonth for eg. 201501 and value as dataframe consit of 10 columns

def union_all_dataframes(*dfs):
    return reduce(DataFrame.unionAll, dfs)

df2 = union_all_dataframes(dict_ym['df_201501'],dict_ym['df_201502'] ... so on till dict_ym['df_201709'])

But in pandas dataframe i can do something like this which will append all the dataframes one below to others using below set of code:

 df2 = pd.concat(dict_ym.values()) # here dict_ym has pandas dataframe in case of spark df

I think their would be more elegant to create pyspark dataframe as well similar as pandas.concat.

Upvotes: 0

Views: 804

Answers (1)

Suresh
Suresh

Reputation: 5870

Try this,

df2 = union_all_dataframes(*dict_ym.values())

Upvotes: 1

Related Questions