find position of string elements in pandas dataframe

Question

I have a pandas data frame and I suspect that it contains some strings

>>> d2
   1     2     3     4     5     6     7     8     9     10    ...   1771  \
0     0     0     0     0     0     0     0     0     0     0  ...      0   
1     0     0     0     0     0     0     0     0     0     0  ...      0   
2     0     0     0     0     0     0     0     0     0     0  ...      0   
3     0     0     0     0     0     0     0     0     0     0  ...      0   
4     0     0     0     0     0     0     0     0     0     0  ...      0   
5     0     0     0     0     0     0     0     0     0     0  ...      0   
6     0     0     0     0     0     0     0     0     0     0  ...      0   
7     0     0     0     0     0     0     0     0     0     0  ...      0   
8     0     0     0     0     0     0     0     0     0     0  ...      0   
9     0     0     0     0     0     0     0     0     0     0  ...      0   

   1772  1773  1774  1775  1776  1777  1778  1779  1780  
0     0     0     0     0     0     0     1   398     2  
1     0     0     0     0     0     0     1   398     2  
2     0     0     0     0     0     0     1   398     2  
3     0     0     0     0     0     0     1   398     2  
4     0     0     0     0     0     0     1   398     2  
5     0     0     0     0     0     0     1   398     2  
6     0     0     0     0     0     0     1   398     2  
7     0     0     0     0     0     0     1   398     2  
8     0     0     0     0     0     0     1   398     2  
9     0     0     0     0     0     0     1   398     2  

[10 rows x 1780 columns]
>>> any(d2.applymap(lambda x: type(x) == str))
True
>>>

I would like to find which elements are string and in case remove the columns containing these elements.

How can I do that?

I get a strange result. It seems that all the columns have dtype int or float but at the same time it seems that some elements are string. How is this possible?

>>> d2.dtypes.drop_duplicates()
1         int64
1755    float64
dtype: object
>>> any(d2.applymap(lambda x: type(x) == str))
True

Primer · Accepted Answer

I would say that you are getting false positives because of the method you use.

Here is what I would do:

To select all columns that might have text you could use this command:

df.select_dtypes(include=['object']).columns

Or alternatively:

df.select_dtypes(exclude=['number']).columns

To check if any cell in the dataframe is text use this command:

df.applymap(lambda x: isinstance(x, str)).any().any()

Or drop last .any() to see all columns which have text and which don't:

df.applymap(lambda x: isinstance(x, str)).any()

Calling any(your_dataframe) (with dataframe as parameter) gives you false positive.

find position of string elements in pandas dataframe

Answers (2)

Related Questions