Tom
Tom

Reputation: 111

Assert a pandas series contains only strings

My question seems so obvious I'm surprised it has not been answered before. Sorry if that is the case.

Currently the only assertion I can make is assert a series dtype is an object ('O'). But that does not certify it contains only strings as a mixed dtype series made of floats and ints for example is also of type 'O'. Of course there would be the brute force method by asserting each element is of type string and using the apply function but that can be long and seems complex for a simple thing to check.

Is there any way to assert it really contains only strings, appart from a long apply on the series ?

Upvotes: 1

Views: 1173

Answers (2)

rafaelc
rafaelc

Reputation: 59274

I would go with all + isinstance as well.

As an alternative, which can be more visual, you can apply the built-in type function

df = pd.DataFrame({'col': [1, 2, '3', '4', '5']})

  col
0   1
1   2
2   3
3   4
4   5

>>> df['col'].apply(type) == str

0    False
1    False
2     True
3     True
4     True
Name: col, dtype: bool

Can do (df['col'].apply(type) == str).all() to check for all True values.


As a final note, using type can yield wrong results for subclasses of string.

e.g:

class mystr(str): 
    pass

df = pd.DataFrame({'col': [1, 2, '3', '4', mystr('5')]})

Notice the difference:

>>> df['col'].apply(type) == str
0    False
1    False
2     True
3     True
4    False
Name: col, dtype: bool

BUT

df['col'].apply(lambda s: isinstance(s, str))

0    False
1    False
2     True
3     True
4     True
Name: col, dtype: bool

So isinstance is the preferred checker in general.

Upvotes: 3

Alexander
Alexander

Reputation: 109546

No easy way, but you can use all with a generator expression:

s = pd.Series(list('ABCD'))
>>> all(isinstance(x, str) for x in s)
True

Upvotes: 2

Related Questions