Reputation: 1471
I have a data frame looks like this:
import numpy as np
import pandas as pd
data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00'],
'value1' : ['a', np.nan ,'c'],
'value2' : ['d','e','f']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))
missing = df.loc[:, df.columns != ('datetime')]
the data above is just a sample. but let say I have a lot of missing values in bigger data. I want to select all the data with missing values in 'value1' column.
missing_index = df[df['value1'].isnull()].index
this code will let get me all the indices of missing values, but I want the actual rows of them, in this case, second row.
So, I tried,
df[missing_index]
but I am having an error saying
KeyError: "DatetimeIndex(['2009-07-24 03:00:00'], dtype='datetime64[ns]', name='datetime', freq=None) not in index"
Upvotes: 2
Views: 321
Reputation: 2137
The error comes from the fact that df[<something>]
is used to get columns. When you call df[missing_index]
it's trying to find the missing_index
in the columns (which is also an Index).
The easiest way to do what you want is as @panktijk pointed out in his comment:
df[df['value1'].isnull()]
However, if for some reason (maybe you want to manipulate them) you want to go your way where you first get the indexes and then use those to pull your sub-dataframe, you could do the following:
df.loc[missing_index]
Upvotes: 1
Reputation: 16997
i am using index to capure the row number:(begin to 0)
import pandas as pd
import numpy as np
data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00', '2009-07-24 05:00:00'],
'value1' : ['a', np.nan ,'c', np.nan],
'value2' : ['d','e','f', 'g']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))
listofnan = df.index[df['value1'].isnull()].tolist()
for i in listofnan:
print(df.index.get_loc(i))
result:
1
3
Upvotes: 0