Reputation: 5660
I have a pandas DataFrame as shown below. I want to identify the index values of the columns in df
that match a given string (more specifically, a string that matches the column names after 'sim-' or 'act-').
# Sample df
import pandas as pd
df = pd.DataFrame({
'sim-prod1': [1, 1.4],
'sim-prod2': [2, 2.1],
'act-prod1': [1.1, 1],
'act-prod2': [2.5, 2]
})
# Get unique prod values from df.columns
prods = pd.Series(df.columns[1:]).str[4:].unique()
prods
array(['prod2', 'prod1'], dtype=object)
I now want to loop through prods
and identify the columns where prod1
and prod2
occur, and then use those columns to create new dataframes. How can I do this? In R I could use the which
function to do this easily. Example dataframes I want to obtain are below.
df_prod1
sim_prod1 act_prod1
0 1.0 1.1
1 1.4 1.0
df_prod2
sim_prod2 act_prod2
0 2.0 2.5
1 2.1 2.0
Upvotes: 1
Views: 53
Reputation: 24324
You can also do this without using groupby()
and for loop
by:-
df_prod2=df[df.columns[df.columns.str.contains(prods[0])]]
df_prod1=df[df.columns[df.columns.str.contains(prods[1])]]
Upvotes: 1
Reputation: 153510
Try this, storing the parts of the dataframe as a dictionary:
df_dict = dict(tuple(df.groupby(df.columns.str[4:], axis=1)))
print(df_dict['prod1'])
print('\n')
print(df_dict['prod2'])
Output:
sim-prod1 act-prod1
0 1.0 1.1
1 1.4 1.0
sim-prod2 act-prod2
0 2.0 2.5
1 2.1 2.0
Upvotes: 2
Reputation: 150785
Try groupby with axis=1
:
for prod, d in df.groupby(df.columns.str[-4:], axis=1):
print(f'this is {prod}')
print(d)
print('='*20)
Output:
this is rod1
sim-prod1 act-prod1
0 1.0 1.1
1 1.4 1.0
====================
this is rod2
sim-prod2 act-prod2
0 2.0 2.5
1 2.1 2.0
====================
Now, to have them as variables:
dfs = {prod:d for prod, d in df.groupby(df.columns.str[-4:], axis=1)}
Upvotes: 2