Reputation: 670
I am trying to generate a pyspark dataframe out of dict_values. I can achieve the same using pandas concate function. and dictionary consist of key as year and value as pyspark dataframe.
Here is my code which i am using i have one alternative to union all all the dataframe which i believe not the better way to achieve it.
dict_ym = {}
for yearmonth in keys:
key_name = 'df_'+str(yearmonth)
dict_ym[key_name]= df
# Add a new column to dataframe
# Perform some more transformation
dict_ym
# Now above dict has key as yearmonth for eg. 201501 and value as dataframe consit of 10 columns
def union_all_dataframes(*dfs):
return reduce(DataFrame.unionAll, dfs)
df2 = union_all_dataframes(dict_ym['df_201501'],dict_ym['df_201502'] ... so on till dict_ym['df_201709'])
But in pandas dataframe i can do something like this which will append all the dataframes one below to others using below set of code:
df2 = pd.concat(dict_ym.values()) # here dict_ym has pandas dataframe in case of spark df
I think their would be more elegant to create pyspark dataframe as well similar as pandas.concat
.
Upvotes: 0
Views: 804