Marina Goldylocks
Marina Goldylocks

Reputation: 9

Pandas for loop to copy columns to separate dataframe, rename df accordingly

I'm trying to take a dataframe, iterate over each column starting with the 2nd and copy that first constant column + next ones one by one to new dataframe.

df = pd.DataFrame({'Year':[2001 ,2002, 2003, 2004, 2005], 'a': [1,2, 3, 4, 5], 'b': [10,20, 30, 40, 50], 'c': [0.1, 0.2, 0.3, 0.4,0.5]})
df

To get a result similar to what this outputs, but i need it to loop since I can have up to 40 columns to run logic on.

df_a=pd.DataFrame()
df_a=df[['Year', 'a']].copy()
df_b=df[['Year', 'b']].copy()
df_c=df[['Year', 'c']].copy()
print(df_a)
print(df_b)
print(df_c)

It would also be nice if I know how to name the df_['name of column it's copying']. Thank you so much and sorry if it's a duplicate.

Upvotes: 1

Views: 1262

Answers (3)

jpp
jpp

Reputation: 164623

You don't need to create a dictionary to copy and access the data you require. You can simply copy your dataframe (deep copy if you have mutable elements) and then use indexing to access a particular series:

dfs = df.set_index('Year').copy()

print(dfs['a'])

Year
2001    1
2002    2
2003    3
2004    4
2005    5
Name: a, dtype: int64

You can iterate over your columns via pd.DataFrame.iteritems:

for key, series in dfs.iteritems():
    print(key, series)

Yes, this gives series, but they can easily be converted to dataframes via series.reset_index() or series.to_frame().

Upvotes: 0

harpan
harpan

Reputation: 8631

You need to make a dictionary of dataframes like below with column name as key and subdataframe as value.

df = df.set_index('Year')
dict_ = {col: df[[col]].reset_index() for col in df.columns}

You can simply use column name to access the dictionary and get the corresponding dataframe.

dict_['a']

Output:

    Year    a
0   2001    1
1   2002    2
2   2003    3
3   2004    4
4   2005    5

You can iterate over the dict_ by:

for col, df in dict_.items():
    print("-"*40) #just for separation
    print(df) #or print(dict_[col])

Output:

----------------------------------------
   Year  a
0  2001  1
1  2002  2
2  2003  3
3  2004  4
4  2005  5
----------------------------------------
   Year   b
0  2001  10
1  2002  20
2  2003  30
3  2004  40
4  2005  50
----------------------------------------
   Year    c
0  2001  0.1
1  2002  0.2
2  2003  0.3
3  2004  0.4
4  2005  0.5

Upvotes: 1

sacuL
sacuL

Reputation: 51335

I'd suggest splitting it through a dict comprehension, then you'll have a dictionary of your separate dataframes. For example:

dict_of_frames = {f'df_{col}':df[['Year', col]] for col in df.columns[1:]}

Gives you a dictionary of df_a, df_b and df_c, which you can access as you would any other dictionary:

>>> dict_of_frames['df_a']
   Year  a
0  2001  1
1  2002  2
2  2003  3
3  2004  4
4  2005  5

>>> dict_of_frames['df_b']
   Year   b
0  2001  10
1  2002  20
2  2003  30
3  2004  40
4  2005  50

Upvotes: 2

Related Questions