Jonatan Pallesen
Jonatan Pallesen

Reputation: 145

Pandas: Coloumn with mixed datatype; how to find the exceptions

I have a large dataframe, and when reading it, it gives me this message: DtypeWarning: Columns (0,8) have mixed types. Specify dtype upon import or set low_memory=False.

It is supposed to be a column of floats, but I suspect a few strings snuck in there. I would like to identify them, and possibly remove them.

I tried df.apply(lambda row: isinstance(row.AnnoyingColumn, (int, float)), 1)

But that gave me an out of memory error.

I assume there must be a better way.

Upvotes: 1

Views: 6161

Answers (1)

offwhitelotus
offwhitelotus

Reputation: 1079

This will give you True if float:

df.some_column.apply(lambda x: isinstance(x, float))

This will give you True if int or string:

df.some_column.apply(lambda x: isinstance(x, (int,str)))

So, to remove strings:

mask = df.some_column.apply(lambda x: isinstance(x, str))
df = df[~mask]

Example that removes floats and strings:

$ df = pd.DataFrame({'a': [1,2.0,'hi',4]})
$ df
    a
0   1
1   2
2   hi
3   4

$ mask = df.a.apply(lambda x: isinstance(x, (float,str)))
$ mask
0    False
1    False
2     True
3    False
Name: a, dtype: bool

$ df = df[~mask]
$ df
    a
0   1
3   4

Upvotes: 10

Related Questions