Reputation: 346
I notice that when an element of a column from a Pandas DataFrame has numeric substrings, the method isnumeric
returns false.
For example:
row 1, column 1 has the following: 0002 0003 1289
row 2, column 1 has the following: 89060 324 123431132
row 3, column 1 has the following: 890GB 32A 34311TT
row 4, column 1 has the following: 82A 34311TT
row 4, column 1 has the following: 82A 34311TT 889 9999C
Clearly, the rows 1 and 2 are all numbers, but isnumeric
returns false for rows 1 and 2.
I found a work-around the involves separating each substring into their own columns and then creating a boolean column for each to add the booleans together to reveal whether a row is all numeric or not. This, however, is tedious and my function doesn't look tidy. I also to not want to strip and replace the whitespace (to squeeze all the substrings into just one number) because I need to preserve the original substrings.
Does anyone know of a simpler solution/technique that will correctly tell me that these elements with one or more numeric sub strings is all numeric? My ultimate goal is to delete these numeric-only rows.
Upvotes: 3
Views: 128
Reputation: 862611
I think need list comprehension with split
with all
for check all numeric strings:
mask = ~df['a'].apply(lambda x: all([s.isnumeric() for s in x.split()]))
mask = [not all([s.isnumeric() for s in x.split()]) for x in df['a']]
If want check if at least one numeric string use any
:
mask = ~df['a'].apply(lambda x: any([s.isnumeric() for s in x.split()]))
mask = [not any([s.isnumeric() for s in x.split()]) for x in df['a']]
Upvotes: 2
Reputation: 164673
Here is one way using pd.Series.map
, any
with a generator expression, str.isdecimal
and str.split
.
import pandas as pd
df = pd.DataFrame({'col1': ['0002 0003 1289', '89060 324 123431132', '890GB 32A 34311TT',
'82A 34311TT', '82A 34311TT 889 9999C']})
df['numeric'] = df['col1'].map(lambda x: any(i.isdecimal() for i in x.split()))
Note that isdecimal
is more strict than isdigit
. But you may need to use str.isdigit
or str.isnumeric
in Python 2.7.
To remove such rows where result is False
:
df = df[df['col1'].map(lambda x: any(i.isdecimal() for i in x.split()))]
Result
First part of logic:
col1 numeric
0 0002 0003 1289 True
1 89060 324 123431132 True
2 890GB 32A 34311TT False
3 82A 34311TT False
4 82A 34311TT 889 9999C True
Upvotes: 1