Reputation: 145
I have a large dataframe, and when reading it, it gives me this message: DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.
It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.
I tried df.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)
But that gave me an out of memory error.
I assume there must be a better way.
Upvotes: 1
Views: 6161
Reputation: 1079
This will give you True if float:
df.some_column.apply(lambda x: isinstance(x, float))
This will give you True if int or string:
df.some_column.apply(lambda x: isinstance(x, (int,str)))
So, to remove strings:
mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]
Example that removes floats and strings:
$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
a
0 1
1 2
2 hi
3 4
$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0 False
1 False
2 True
3 False
Name: a, dtype: bool
$ df = df[~mask]
$ df
a
0 1
3 4
Upvotes: 10