Yun Tae Hwang
Yun Tae Hwang

Reputation: 1471

data frame selecting data using a DatetimeIndex

I have a data frame looks like this:

import numpy as np
import pandas as pd    

data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00'],
     'value1' : ['a', np.nan ,'c'],
     'value2' : ['d','e','f']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))
missing = df.loc[:, df.columns != ('datetime')]
 

the data above is just a sample. but let say I have a lot of missing values in bigger data. I want to select all the data with missing values in 'value1' column.

missing_index = df[df['value1'].isnull()].index

this code will let get me all the indices of missing values, but I want the actual rows of them, in this case, second row.

So, I tried,

df[missing_index]

but I am having an error saying

KeyError: "DatetimeIndex(['2009-07-24 03:00:00'], dtype='datetime64[ns]', name='datetime', freq=None) not in index"

Upvotes: 2

Views: 321

Answers (2)

aiguofer
aiguofer

Reputation: 2137

The error comes from the fact that df[<something>] is used to get columns. When you call df[missing_index] it's trying to find the missing_index in the columns (which is also an Index).

The easiest way to do what you want is as @panktijk pointed out in his comment:

df[df['value1'].isnull()]

However, if for some reason (maybe you want to manipulate them) you want to go your way where you first get the indexes and then use those to pull your sub-dataframe, you could do the following:

df.loc[missing_index]

Upvotes: 1

Frenchy
Frenchy

Reputation: 16997

i am using index to capure the row number:(begin to 0)

import pandas as pd
import numpy as np

data = {'datetime' : ['2009-07-24 02:00:00', '2009-07-24 03:00:00','2009-07-24 04:00:00', '2009-07-24 05:00:00'],
     'value1' : ['a', np.nan ,'c', np.nan],
     'value2' : ['d','e','f', 'g']}
df = pd.DataFrame(data)
df = df.set_index(pd.DatetimeIndex(df['datetime']))

listofnan = df.index[df['value1'].isnull()].tolist()

for i in listofnan:
    print(df.index.get_loc(i)) 

result:

1
3

Upvotes: 0

Related Questions