Krzysztof Słowiński
Krzysztof Słowiński

Reputation: 7217

Extract the index values of DataFrame that are not float in pandas

I am working with a DataFrame that is expected to contain only float type index values, but I suspect that for some reason there are values of a different type, resulting in the dtype='object' type of the index. I would like to extract the values of the index that are not of the float type to see if this is the case.

Example

df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]}, index=[0.0,1.5,'a'])

The result of extraction in this case would be a list containing a single element ['a'].

Upvotes: 2

Views: 3811

Answers (4)

jpp
jpp

Reputation: 164623

You can use collections.defaultdict to create a dictionary of types mapped to values:

from collections import defaultdict

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [4,5,6,7,8]},
                  index=[0.0,1.5,'a',pd.to_datetime('10/05/2018'),'b'])

vals = pd.to_numeric(df.index, errors='coerce')
idx = df.index[vals.isnull()]

d = defaultdict(list)

for x in idx:
    d[type(x)].append(x)

Then, for example, you can use d.keys() to extract all non-numeric types, or d[str] to extract indices which are strings.

Result

print(d)

defaultdict(list,
            {str: ['a', 'b'],
             pandas._libs.tslibs.timestamps.Timestamp: [Timestamp('2018-10-05 00:00:00')]})

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

With list comprehension - updated based on Coldspeeds recommendation:

[x for x in df.index if not isinstance(x, float)]

Upvotes: 2

cs95
cs95

Reputation: 402323

Would you want just a way of figuring out what kind of bogus data you have? If so, this is enough.

df.index[df.index.str[0].notna()]
Index(['a'], dtype='object')

Assuming you have string entries, anything that is not a string will show up as NaN.

If you want to get rid of invalid data, then try coercing it to float, and get rid of NaN rows.

m = pd.to_numeric(df.index, errors='coerce').notna()  #.notnull()
df[m]
     a  b
0.0  1  4
1.5  2  5

Upvotes: 4

jezrael
jezrael

Reputation: 862511

Use isinstance with map:

idx = df.index[(df.index.map(lambda x: isinstance(x, float)) == False)]
print (idx)

Index(['a'], dtype='object')

Upvotes: 3

Related Questions