NBC
NBC

Reputation: 1698

What is the fastest way to select rows that contain a value in a Pandas dataframe?

I am currently following the instructions laid out here for finding values, and it works. The only problem is my dataframe is quite big (5x3500 rows) and I need to perform around ~2000 searches. Each one takes around 4 seconds, so obviously this adds up and has become a bit unsustainable on my end.

Most concise way to select rows where any column contains a string in Pandas dataframe?

Is there a faster way to search for all rows containing a string value than this?

df[df.apply(lambda r: r.str.contains('b', case=False).any(), axis=1)] 

Upvotes: 6

Views: 2447

Answers (2)

jpp
jpp

Reputation: 164713

One trivial possibility is to disable regex:

res = df[df.apply(lambda r: r.str.contains('b', case=False, regex=False).any(), axis=1)] 

Another way using a list comprehension:

res = df[[any('b' in x.lower() for x in row) for row in df.values)]]

Upvotes: 2

BENY
BENY

Reputation: 323306

You can testing the speed

boolfilter=(np.char.find(df.values.ravel().astype(str),'b')!=-1).reshape(df.shape).any(1)
boolfilter
array([False,  True,  True])
newdf=df[boolfilter]

Upvotes: 4

Related Questions