JamesHudson81
JamesHudson81

Reputation: 2273

Filtering columns in dataframe that begin with a specific string

I have the following df, and I would like to apply a filter over the column names and simply remain those that begin with a certain string:

This is my current df:

ruta2:
             Current SAN Prev.1m SAN Prev.2m SAN Prev.3m SAN Current TRE  \

A                   5           6           7           6           3
B                   6           5           7           6           6
C                  12          11          11          11           8

Basically what I would like is to filter the dataframe and remain the columns that begin with Current.

Then the desired output would be:

ruta2:
             Current SAN  Current TRE  

A                   5            3
B                   6            6
C                  12            8

In order to do this I tried this filter but outputs a value error :

ruta2=ruta2[~(ruta2.columns.str[:4].str.startswith('Prev'))]

Upvotes: 1

Views: 2663

Answers (2)

jezrael
jezrael

Reputation: 863226

It seems you only need:

ruta2=ruta2.loc[:, ~(ruta2.columns.str[:4].str.startswith('Prev'))]
#same as
#ruta2=ruta2.loc[:, ~ruta2.columns.str.startswith('Prev')]
print (ruta2)
   Current SAN  Current TRE
A            5            3
B            6            6
C           12            8

Or:

cols = ruta2.columns[ ~(ruta2.columns.str[:4].str.startswith('Prev'))]
ruta2=ruta2[cols]
print (ruta2)
   Current SAN  Current TRE
A            5            3
B            6            6
C           12            8

But if need only Current columns use filter (^ means start of string in regex):

ruta2=ruta2.filter(regex='^Current')
print (ruta2)
   Current SAN  Current TRE
A            5            3
B            6            6
C           12            8

Upvotes: 1

Allen Qin
Allen Qin

Reputation: 19957

#filter the columns names starting with 'Current'
ruta2[[e for e in ruta2.columns if e.startswith('Current')]]
Out[383]: 
   Current SAN  Current TRE
A           5           3
B           6           6
C          12           8

Or you can use a mask array to filter columns:

ruta2.loc[:,ruta2.columns.str.startswith('Current')]
Out[385]: 
   Current SAN  Current TRE
A           5           3
B           6           6
C          12           8

Upvotes: 0

Related Questions