omdurg
omdurg

Reputation: 330

Issue in applying str.contains across multiple columns in Python

Dataframe:

col1          col2             col3
132jh.2ad3    34.2             65
298.487       9879.87          1kjh8kjn0
98.47         79.8             90
8763.3        7hkj7kjb.k23l    67
69.3          3765.9           3510

Desired output:

col1          col2             col3
98.47         79.8             90
69.3          3765.9           3510

What I have tried: (this doesn't delete all rows with alpha-numeric vales)

df=df[~df['col1'].astype(str).str.contains(r'[A-Ba-b]')] #for col1
df=df[~df['col2'].astype(str).str.contains(r'[A-Ba-b]')] #for col2
df=df[~df['col3'].astype(str).str.contains(r'[A-Ba-b]')] #for col3

I want to delete all alphanumeric rows, and have only the rows containing numbers alone. Col1 and Col2 has decimal points, but Col3 has only whole numbers.
I have tried few other similar threads, but it didn't work.

Thanks for the help!!

Upvotes: 2

Views: 279

Answers (3)

Valdi_Bo
Valdi_Bo

Reputation: 30971

Run:

df[~df.apply(lambda row: row.str.contains(r'[A-Z]', flags=re.I).any(), axis=1)]

(import re required).

Your regex contained [A-B], but it should match all letters (from A to Z).

Edit

If you have also other columns, but you want to limit your criterion to just your 3 indicated columns, assuming that they are consecutive columns, run:

df[~df.loc[:, 'col1':'col3'].apply(lambda row:
    row.str.contains(r'[A-Z]', flags=re.I).any(), axis=1)]

This way you apply the same function as above to just these 3 columns.

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150735

You can just use to_numeric:

df[df.apply(pd.to_numeric, errors='coerce').notnull().all(1)]

Output:

    col1    col2  col3
2  98.47    79.8    90
4   69.3  3765.9  3510

Upvotes: 4

It_is_Chris
It_is_Chris

Reputation: 14103

Here is a solution that does not require the use apply (which can be slow) but rather stack

# stack and use isnumeric to see if str is a number or float
# then unstack and dropna
df[df.stack().str.replace('.','').str.isnumeric().unstack()].dropna()

    col1    col2  col3
2  98.47    79.8    90
4   69.3  3765.9  3510

Upvotes: 0

Related Questions