Andrew
Andrew

Reputation: 970

Expected behavior of Pandas str.isnumeric()

I have a multi-dtype series pd.Series like [100, 50, 0, foo, bar, baz]

when I run pd.Series.str.isnumeric()

I get [NaN, NaN, NaN, False, False, False]

Why is this happening? Shouldn't it return True for the first three in this series?

Upvotes: 6

Views: 8992

Answers (2)

jpp
jpp

Reputation: 164793

Pandas string methods follow Python methods closely:

str.isnumeric(100)    # TypeError
str.isnumeric('100')  # True
str.isnumeric('a10')  # False

Any type which yields an error will give NaN. As per the Python docs, str.isnumeric is only applicable for strings:

str.isnumeric()
Return true if all characters in the string are numeric characters, and there is at least one character, false otherwise.

As per the Pandas docs, pd.Series.str.isnumeric is equivalent to str.isnumeric:

Series.str.isnumeric()
Check whether all characters in each string in the Series/Index are numeric. Equivalent to str.isnumeric().

Your series has "object" dtype, this is an all-encompassing type which holds pointers to arbitrary Python objects. These may be a mixture of strings, integers, etc. Therefore, you should expect NaN values where strings are not found.

To accommodate numeric types, you need to convert to strings explicitly, e.g. given a series s:

s.astype(str).str.isnumeric()

Upvotes: 13

user3483203
user3483203

Reputation: 51175

Using the string accessor is converting your numbers to NaN, it is happening before you even try to use isnumeric:

s = pd.Series([100, 50, 0, 'foo', 'bar', 'baz'])
s.str[:]

0    NaN
1    NaN
2    NaN
3    foo
4    bar
5    baz
dtype: object

So the NaN's remain when you use isnumeric. Use astype first instead:

s.astype(str).str.isnumeric()

0     True
1     True
2     True
3    False
4    False
5    False
dtype: bool

Upvotes: 5

Related Questions