duber
duber

Reputation: 2869

Identifying multiple columns by name in Pandas

Is there a way to select a subset of columns using text matching or regular expressions?

In R it would be like this:

attach(iris) #Load the 'Stairway to Heaven' of R's built-in data sets
iris[grep(names(iris),pattern="Length")] #Prints only columns containing the word "Length"

Upvotes: 2

Views: 3959

Answers (2)

joris
joris

Reputation: 139172

You can use the filter method for this (use axis=1 to filter on the column names). This function has different possibilities:

  • Equivalent to if 'Length' in col:

    df.filter(like='Length', axis=1)
    
  • Using a regex (however, it is using re.search and not re.match, so you have possibly to adjust the regex):

    df.filter(regex=r'\.Length$', axis=1)
    

Upvotes: 6

duber
duber

Reputation: 2869

Using Python's in statement, it would work like this:

#Assuming iris is already loaded as a df called 'iris' and has a proper header
iris = iris[[col for col in iris.columns if 'Length' in col]]
print iris.head()

Or, using regular expressions,

import re
iris = iris[[col for col in iris.columns if re.match(r'\.Length$',col)]]
print iris.head()

The first will run faster but the second will be more accurate.

Upvotes: 1

Related Questions