Reputation: 7217
I am working with a DataFrame that is expected to contain only float type index values, but I suspect that for some reason there are values of a different type, resulting in the dtype='object'
type of the index. I would like to extract the values of the index that are not of the float type to see if this is the case.
Example
df = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6]}, index=[0.0,1.5,'a'])
The result of extraction in this case would be a list containing a single element ['a']
.
Upvotes: 2
Views: 3811
Reputation: 164623
You can use collections.defaultdict
to create a dictionary of types mapped to values:
from collections import defaultdict
df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [4,5,6,7,8]},
index=[0.0,1.5,'a',pd.to_datetime('10/05/2018'),'b'])
vals = pd.to_numeric(df.index, errors='coerce')
idx = df.index[vals.isnull()]
d = defaultdict(list)
for x in idx:
d[type(x)].append(x)
Then, for example, you can use d.keys()
to extract all non-numeric types, or d[str]
to extract indices which are strings.
Result
print(d)
defaultdict(list,
{str: ['a', 'b'],
pandas._libs.tslibs.timestamps.Timestamp: [Timestamp('2018-10-05 00:00:00')]})
Upvotes: 1
Reputation: 18647
With list comprehension - updated based on Coldspeeds recommendation:
[x for x in df.index if not isinstance(x, float)]
Upvotes: 2
Reputation: 402323
Would you want just a way of figuring out what kind of bogus data you have? If so, this is enough.
df.index[df.index.str[0].notna()]
Index(['a'], dtype='object')
Assuming you have string entries, anything that is not a string will show up as NaN.
If you want to get rid of invalid data, then try coercing it to float, and get rid of NaN rows.
m = pd.to_numeric(df.index, errors='coerce').notna() #.notnull()
df[m]
a b
0.0 1 4
1.5 2 5
Upvotes: 4
Reputation: 862511
Use isinstance
with map
:
idx = df.index[(df.index.map(lambda x: isinstance(x, float)) == False)]
print (idx)
Index(['a'], dtype='object')
Upvotes: 3