Reputation: 619
I want to create a for_loop that doesn't overwrite the exiting dataframe?
for df in 2011, 2012, 2013:
df = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')
Right now the for loop above iterates over each of the existing dataframes. How can I make it so the for loop creates a bunch of new dataframes?
2011_pivot, 2012_pivot, 2013_pivot
Upvotes: 2
Views: 161
Reputation: 164613
Don't create variables needlessly. Use a dict
or list
instead, e.g. via a dictionary or list comprehension.
Alternatively, consider MultiIndex
columns and a single pd.pivot_table
call:
dfs = {2011: df_2011, 2012: df_2012, 2013: df_2013}
comb = pd.concat([v.assign(year=k) for k, v in dfs.items()], ignore_index=True)
df = pd.pivot_table(comb, index='income', columns=['year', 'area'],
values='id', aggfunc='count')
Then you can use regular indexing methods to filter for a particular year, e.g.
pivot_2011 = df.iloc[:, df.columns.get_level_values(0).eq(2011)]
Upvotes: 0
Reputation: 2939
I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'
df_dict = dict()
for df in 2011, 2012, 2013:
df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')
I'm assuming here that your dataframes have the names "2011", "2012", "2013"
Upvotes: 3
Reputation: 1576
I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.
df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013]
You can find an example here.
Upvotes: 1