Gaurav Bansal
Gaurav Bansal

Reputation: 5660

Identify where pandas dataframe columns match string

I have a pandas DataFrame as shown below. I want to identify the index values of the columns in df that match a given string (more specifically, a string that matches the column names after 'sim-' or 'act-').

# Sample df
import pandas as pd
df = pd.DataFrame({
    'sim-prod1': [1, 1.4],
    'sim-prod2': [2, 2.1],
    'act-prod1': [1.1, 1],
    'act-prod2': [2.5, 2]
})

# Get unique prod values from df.columns
prods = pd.Series(df.columns[1:]).str[4:].unique()
prods
  array(['prod2', 'prod1'], dtype=object)

I now want to loop through prods and identify the columns where prod1 and prod2 occur, and then use those columns to create new dataframes. How can I do this? In R I could use the which function to do this easily. Example dataframes I want to obtain are below.

df_prod1
    sim_prod1   act_prod1
0   1.0         1.1
1   1.4         1.0

df_prod2
    sim_prod2   act_prod2
0   2.0         2.5
1   2.1         2.0

Upvotes: 1

Views: 53

Answers (3)

Anurag Dabas
Anurag Dabas

Reputation: 24324

You can also do this without using groupby() and for loop by:-

df_prod2=df[df.columns[df.columns.str.contains(prods[0])]]
df_prod1=df[df.columns[df.columns.str.contains(prods[1])]]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

Try this, storing the parts of the dataframe as a dictionary:

df_dict = dict(tuple(df.groupby(df.columns.str[4:], axis=1)))

print(df_dict['prod1'])
print('\n')
print(df_dict['prod2'])

Output:

   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0


   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

Try groupby with axis=1:

for prod, d in df.groupby(df.columns.str[-4:], axis=1):
    print(f'this is {prod}')
    print(d)
    print('='*20)

Output:

this is rod1
   sim-prod1  act-prod1
0        1.0        1.1
1        1.4        1.0
====================
this is rod2
   sim-prod2  act-prod2
0        2.0        2.5
1        2.1        2.0
====================

Now, to have them as variables:

dfs = {prod:d for prod, d in df.groupby(df.columns.str[-4:], axis=1)}

Upvotes: 2

Related Questions