Reputation: 4628
I have a pandas df of floats, but due to improper output/errors from the program from which I received the data, a number of rows contain values that are actually strings.
I want to remove these rows from the df with minimal looping. Ideally, I would like to mask all values in the df to those which are strings, and drop the row with True values. Another would be to iterate through each row and mask each individual row and delete if a True is in the mask. Worst case would be to loop over each row and also loop over each value to achieve the same task.
Can anyone advise how I could do this the most efficiently?
Something akin to df.iloc[x].istype(str) or something?
I tried df.loc[row_num].contains(str) as a futile attempt but didn't work.
I know I can loop over every single cell and do isinstance(cell,str) to check if it's a string but would really prefer some kind of masking technique.
As a side note to narrow down any solutions, I don't want to fix any string values to be floats, I just want to delete the entire row.
Thanks in advance.
Example of problematic row is below, notice the string with two decimals:
df.loc[516].values
array([890.0, 33.17, 29.64, 78.355, 80.182, 83.196, 86.721,
90.12299999999999, 92.807, '91.705.099', 98.89, 99.007,
99.34200000000001, 99.337, 100.43799999999999, 99.867, '100.625',
100.712, 100.46, 100.427, 101.16799999999999, 100.904, 100.193,
100.255, 100.537, 100.37100000000001, 100.535, 100.584, 101.52,
101.787, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan], dtype=object)
Upvotes: 0
Views: 1502
Reputation: 30589
Using isreal
and all
we can select all rows where all elements are real, i.e. int or float:
df[df.applymap(np.isreal).all(axis=1)]
Example:
df = pd.DataFrame({'a': [1,'2',3], 'b': [10,20,np.nan]})
df = df[df.applymap(np.isreal).all(axis=1)]
gives
a b
0 1 10.0
2 3 NaN
(caveat: this will filter out complex numbers too although they are numeric of course)
Upvotes: 1
Reputation: 25249
Try map
and check type str
df.loc[516].map(type).eq(str).any()
It will return True
if any cell in row 516
is type str
If you want to check whole df
, just use applymap
df.applymap(type).eq(str).any(1)
It will return a series mask True/False
for each row
Upvotes: 1
Reputation: 712
You could transpose the dataframe, then try to convert each column (which was originally a row) using the pd.to_numeric(). If there is a parse error because of a string that cannot be converted to an int or float, it will throw a ValueError. You can catch this exception and delete that column. Something like this:
df_transposed = df.T
for col in df_transposed:
try:
df_transposed[col] = pd.to_numeric(df_transposed[col])
except ValueError:
df_transposed = df_transposed.drop(columns=[col], axis=1)
df = df_transposed.T
Upvotes: 2