Reputation: 111
My question seems so obvious I'm surprised it has not been answered before. Sorry if that is the case.
Currently the only assertion I can make is assert a series dtype is an object ('O'). But that does not certify it contains only strings as a mixed dtype series made of floats and ints for example is also of type 'O'. Of course there would be the brute force method by asserting each element is of type string and using the apply function but that can be long and seems complex for a simple thing to check.
Is there any way to assert it really contains only strings, appart from a long apply on the series ?
Upvotes: 1
Views: 1173
Reputation: 59274
I would go with all
+ isinstance
as well.
As an alternative, which can be more visual, you can apply
the built-in type
function
df = pd.DataFrame({'col': [1, 2, '3', '4', '5']})
col
0 1
1 2
2 3
3 4
4 5
>>> df['col'].apply(type) == str
0 False
1 False
2 True
3 True
4 True
Name: col, dtype: bool
Can do (df['col'].apply(type) == str).all()
to check for all True
values.
As a final note, using type
can yield wrong results for subclasses of string.
e.g:
class mystr(str):
pass
df = pd.DataFrame({'col': [1, 2, '3', '4', mystr('5')]})
Notice the difference:
>>> df['col'].apply(type) == str
0 False
1 False
2 True
3 True
4 False
Name: col, dtype: bool
BUT
df['col'].apply(lambda s: isinstance(s, str))
0 False
1 False
2 True
3 True
4 True
Name: col, dtype: bool
So isinstance
is the preferred checker in general.
Upvotes: 3
Reputation: 109546
No easy way, but you can use all
with a generator expression:
s = pd.Series(list('ABCD'))
>>> all(isinstance(x, str) for x in s)
True
Upvotes: 2