Reputation: 224
I import some data from a parquet file into a DataFrame and want to check the data types. One of the data types I expect is strings. To do this, I have something like the following:
import pandas as pd
col = pd.Series([None, 'b', 'c', None, 'e'])
assert((col.dtype == object) and (isinstance(col[0], str)))
But, as you can see, this does not work if I accidentally have a None
value at the beginning.
Does anybody have an idea how to do that efficiently (preferably without having to check each element of the series)?
Upvotes: 2
Views: 2440
Reputation: 1531
you can convert entire series
all values to str
type as follows:
col = col.astype(str)
None
value will became string value.
Upvotes: 0
Reputation: 88226
As of Pandas 1.0.0 there's a StringDtype
, which you can use to check if the pd.Series
contains only either NaN
or string values:
try:
col.astype('string')
except ValueError as e:
raise e
If you try with a column containing an int
:
col = pd.Series([None, 2, 'c', None, 'e'])
try:
col.astype('string')
except ValueError as e:
raise e
You'd get a ValueError
:
ValueError: StringArray requires a sequence of strings or pandas.NA
Upvotes: 2
Reputation: 30579
You can use first_valid_index
to retrieve and check the first non-NA item:
isinstance(col.iloc[col.first_valid_index()], str)
Upvotes: 3