JohnE
JohnE

Reputation: 30414

Select only the column names that contain a specific string

A simple example should make this obvious. Sample data:

df = pd.DataFrame( np.random.randn(2,6), columns=['x','y','xy','yx','xx','yy'] )

Now, I just want to list the values for columns containing 'x'. Here's a couple of ways:

df[[ x for x in df.columns if 'x' in x ]]
Out[53]: 
          x        xy        yx        xx
0  2.089078  1.111139 -0.218800  1.025810
1 -0.343189  0.274676 -0.342798 -0.503809

df[ df.columns[pd.Series(df.columns).str.contains('x')] ]
Out[54]: 
          x        xy        yx        xx
0  2.089078  1.111139 -0.218800  1.025810
1 -0.343189  0.274676 -0.342798 -0.503809

The latter approach seems promising but it's just really ugly and I haven't so far found a way to shorten it. Something more like this would be great:

df[ columns_with( df, 'x' ) ] 

and in fact I did something just like that with a function, but am wondering if there is a pandastic way to do this without a user written function or monkeypatch?

For motivation/background, this sort of thing is super useful when you have an unfamiliar dataset with lots of columns or even when you have a familiar dataset but can't remember the exact name of one variable out of hundreds. For the situations where I need this functionality, I'll often be doing this over and over again during data exploration stages, so it's really worth it to me to have a quick and simple way to do this.

Upvotes: 4

Views: 7793

Answers (2)

JohnE
JohnE

Reputation: 30414

@DSM's answer fully addresses this question, but the regex kwarg provides a comparable and more general approach than like. For example, to replicate the functionality of like:

df.filter(regex='x')

But you can also do much more, such as:

df.filter(regex='^x')       # x must be the first character
df.filter(regex='X|x')      # includes upper and lower case

Upvotes: 2

DSM
DSM

Reputation: 352989

You could use DataFrame.filter with the like argument:

>>> df.filter(like="x")
          x        xy        yx        xx
0 -1.467867  0.766077  1.210667  1.116529
1 -0.041965  0.546325 -0.590660  1.037151

The like argument means "keep info axis where arg in col == True".

Upvotes: 11

Related Questions