Reputation: 9
I'm trying to take a dataframe, iterate over each column starting with the 2nd and copy that first constant column + next ones one by one to new dataframe.
df = pd.DataFrame({'Year':[2001 ,2002, 2003, 2004, 2005], 'a': [1,2, 3, 4, 5], 'b': [10,20, 30, 40, 50], 'c': [0.1, 0.2, 0.3, 0.4,0.5]})
df
To get a result similar to what this outputs, but i need it to loop since I can have up to 40 columns to run logic on.
df_a=pd.DataFrame()
df_a=df[['Year', 'a']].copy()
df_b=df[['Year', 'b']].copy()
df_c=df[['Year', 'c']].copy()
print(df_a)
print(df_b)
print(df_c)
It would also be nice if I know how to name the df_['name of column it's copying']. Thank you so much and sorry if it's a duplicate.
Upvotes: 1
Views: 1262
Reputation: 164623
You don't need to create a dictionary to copy and access the data you require. You can simply copy your dataframe (deep copy if you have mutable elements) and then use indexing to access a particular series:
dfs = df.set_index('Year').copy()
print(dfs['a'])
Year
2001 1
2002 2
2003 3
2004 4
2005 5
Name: a, dtype: int64
You can iterate over your columns via pd.DataFrame.iteritems
:
for key, series in dfs.iteritems():
print(key, series)
Yes, this gives series, but they can easily be converted to dataframes via series.reset_index()
or series.to_frame()
.
Upvotes: 0
Reputation: 8631
You need to make a dictionary of dataframes like below with column name as key and subdataframe as value.
df = df.set_index('Year')
dict_ = {col: df[[col]].reset_index() for col in df.columns}
You can simply use column name to access the dictionary and get the corresponding dataframe.
dict_['a']
Output:
Year a
0 2001 1
1 2002 2
2 2003 3
3 2004 4
4 2005 5
You can iterate over the dict_
by:
for col, df in dict_.items():
print("-"*40) #just for separation
print(df) #or print(dict_[col])
Output:
----------------------------------------
Year a
0 2001 1
1 2002 2
2 2003 3
3 2004 4
4 2005 5
----------------------------------------
Year b
0 2001 10
1 2002 20
2 2003 30
3 2004 40
4 2005 50
----------------------------------------
Year c
0 2001 0.1
1 2002 0.2
2 2003 0.3
3 2004 0.4
4 2005 0.5
Upvotes: 1
Reputation: 51335
I'd suggest splitting it through a dict comprehension, then you'll have a dictionary of your separate dataframes. For example:
dict_of_frames = {f'df_{col}':df[['Year', col]] for col in df.columns[1:]}
Gives you a dictionary of df_a
, df_b
and df_c
, which you can access as you would any other dictionary:
>>> dict_of_frames['df_a']
Year a
0 2001 1
1 2002 2
2 2003 3
3 2004 4
4 2005 5
>>> dict_of_frames['df_b']
Year b
0 2001 10
1 2002 20
2 2003 30
3 2004 40
4 2005 50
Upvotes: 2