Reputation: 988
I have a main dataframe df_PROD, and for certain range of years, I want to filter those records from the main df and if the number of records more than 0, will push them into a separate df (i.e df_PROD_year) and append that year into a list which can be used for later purpose.
I am trying to create dynamic names for dataframe inside a for loop as below and if the records are more than 0, I am adding into a separate df_year and I am trying to append that year into another list as below.
PROD_years_list = []
year=int(datetime.datetime.today().year)
for i in range (year, 2016, -1 ):
print(i)
df_PROD_{i} = df_PROD.filter(col("Year") == i)
if df_PROD_{i}.count() > 0:
PROD_years_list.append(i)
print(PROD_years_list)
But I get invalid syntax error for the line:
df_PROD_{i} = df_PROD.filter(col("Year") == i)
How to dynamically name a dataframe inside a for loop? Thanks.
Upvotes: 1
Views: 3420
Reputation: 32670
Using a dict is probably a better option for your need. You store each dataframe with the corresponding year as a key:
PROD_years = {}
year=int(datetime.datetime.today().year)
for i in range (year, 2016, -1 ):
df = df_PROD.filter(col("Year") == i)
if df.count() > 0:
PROD_years[i] = df
print(PROD_years)
Upvotes: 2