Sarin Malcova
Sarin Malcova

Reputation: 103

How to subset a DataFrame based on similar column names

How to subset similar columns in pandas based on keywords like A B C D. Now I have taken this as example is there any better way if new columns were given and logic should work.

df

A1 A2 A3   B1 B2 B3  C1 C2   D1 D2 D3 D4
1   a  x   1   a  x  3   c   7 d  s  4 
2   b  5   2   b  5  4   d   s c  7  d
3   c  7   3   c  7  1   a   x 1  a  x 
4   d  s   4   d  s  b   5   2 b  s  7

Upvotes: 0

Views: 471

Answers (3)

Pablo C
Pablo C

Reputation: 4761

You can use pandas.Index.groupby

groups = df.columns.groupby(df.columns.str[0])
#{'A': ['A1', 'A2', 'A3'], 
# 'B': ['B1', 'B2', 'B3'], 
# 'C': ['C1', 'C2'], 
# 'D': ['D1', 'D2', 'D3', 'D4']}

Then you can access data this way:

df[groups['B']]
#   B1 B2 B3
#0   1  a  x
#1   2  b  5
#2   3  c  7
#3   4  d  s

Keep in mind groups is a dict, so you can use any dict method too.

Upvotes: 1

Vaebhav
Vaebhav

Reputation: 5032

Another approach can be to use df.columns in conjuction with str.contains

a_col_lst = df.columns[df.columns.str.contains('A')]
b_col_lst = df.columns[df.columns.str.contains('B')]

df_A = df.loc[a_col_lst]
df_B = df.loc[b_col_lst]

You can apply regex as well within columns.str.contains

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521178

You could use filter along with a regex pattern, e.g.

df_A = df.filter(regex=(r'^A.*'))

You could also use select along with startswith:

df_A = df.select(lambda col: col.startswith('A'), axis=1)

Upvotes: 0

Related Questions