Reputation: 9407
I have a huge dataset with thousands of rows and hundreds of columns. One of these columns contain a string because I am getting an error. I want to locate this string. All my columns are supposed to be float values, however one of these columns has a type str
somewhere.
How can I loop through a particular column using Pandas
and print only the row that is of type str
? I want to find out what the string(s) are so I can convert them to their numerical equivalent.
Upvotes: 4
Views: 4445
Reputation: 210832
If your goal is to convert everything to numerical values, then you can use this approach:
Sample DF:
In [126]: df = pd.DataFrame(np.arange(15).reshape(5,3)).add_prefix('col')
In [127]: df.loc[0,'col0'] = 'XXX'
In [128]: df
Out[128]:
col0 col1 col2
0 XXX 1 2
1 3 4 5
2 6 7 8
3 9 10 11
4 12 13 14
In [129]: df.dtypes
Out[129]:
col0 object
col1 int32
col2 int32
dtype: object
Solution:
In [130]: df.loc[:, df.dtypes.eq('object')] = df.loc[:, df.dtypes.eq('object')].apply(pd.to_numeric, errors='coerce')
In [131]: df
Out[131]:
col0 col1 col2
0 NaN 1 2
1 3.0 4 5
2 6.0 7 8
3 9.0 10 11
4 12.0 13 14
In [132]: df.dtypes
Out[132]:
col0 float64
col1 int32
col2 int32
dtype: object
Upvotes: 2
Reputation: 323226
Using applymap
with type
df = pd.DataFrame({'C1': [1,2,3,'4'], 'C2': [10, 20, '3',40]})
df.applymap(type)==str
Out[73]:
C1 C2
0 False False
1 False False
2 False True
3 True False
Here you know the str cell.
Then we using np.where
to locate it
np.where((df.applymap(type)==str))
Out[75]: (array([2, 3], dtype=int64), array([1, 0], dtype=int64))
Upvotes: 5