Reputation: 330
Dataframe:
col1 col2 col3
132jh.2ad3 34.2 65
298.487 9879.87 1kjh8kjn0
98.47 79.8 90
8763.3 7hkj7kjb.k23l 67
69.3 3765.9 3510
Desired output:
col1 col2 col3
98.47 79.8 90
69.3 3765.9 3510
What I have tried: (this doesn't delete all rows with alpha-numeric vales)
df=df[~df['col1'].astype(str).str.contains(r'[A-Ba-b]')] #for col1
df=df[~df['col2'].astype(str).str.contains(r'[A-Ba-b]')] #for col2
df=df[~df['col3'].astype(str).str.contains(r'[A-Ba-b]')] #for col3
I want to delete all alphanumeric rows, and have only the rows containing numbers alone. Col1 and Col2 has decimal points, but Col3 has only whole numbers.
I have tried few other similar threads, but it didn't work.
Thanks for the help!!
Upvotes: 2
Views: 279
Reputation: 30971
Run:
df[~df.apply(lambda row: row.str.contains(r'[A-Z]', flags=re.I).any(), axis=1)]
(import re required).
Your regex contained [A-B], but it should match all letters (from A to Z).
If you have also other columns, but you want to limit your criterion to just your 3 indicated columns, assuming that they are consecutive columns, run:
df[~df.loc[:, 'col1':'col3'].apply(lambda row:
row.str.contains(r'[A-Z]', flags=re.I).any(), axis=1)]
This way you apply the same function as above to just these 3 columns.
Upvotes: 1
Reputation: 150735
You can just use to_numeric
:
df[df.apply(pd.to_numeric, errors='coerce').notnull().all(1)]
Output:
col1 col2 col3
2 98.47 79.8 90
4 69.3 3765.9 3510
Upvotes: 4
Reputation: 14103
Here is a solution that does not require the use apply
(which can be slow) but rather stack
# stack and use isnumeric to see if str is a number or float
# then unstack and dropna
df[df.stack().str.replace('.','').str.isnumeric().unstack()].dropna()
col1 col2 col3
2 98.47 79.8 90
4 69.3 3765.9 3510
Upvotes: 0