Jinhua Wang
Jinhua Wang

Reputation: 1759

Pandas groupby datetime and column then apply generates ValueError

I am trying to group my dataframe and then apply a function to each row of the dataframe:

df=pd.read_csv('stack.csv')
df['TIME_M']=pd.to_datetime(df['TIME_M'],format='%Y%m%d %H:%M:%S.%f')
df.groupby(['SYM_ROOT',df['TIME_M'].dt.date]).apply(group_increment_to_end)

def group_increment_to_end(x):
    return x.iloc[0:1]

SYM_ROOT is a category variable, while TIME_M is a datetime variable.

However, I keep getting the following error:

ValueError: Key 2017-01-03 00:00:00 not in level Index([2017-01-03], dtype='object', name=u'TIME_M')

Do you know what is the cause of the problem? Is it because of the fact that iloc cannot be applied to a function with multiple indices? What if I want to iterate through the rows and add rows with the group_increment_to_end function, how should I do that, if I can't use the iloc function?

UPDATE:

The dataset can be downloaded here.

| SYM_ROOT | TIME_M                     | BEST_BID | BEST_ASK | increment | genjud_incre | 
|----------|----------------------------|----------|----------|-----------|--------------| 
| A        | 2017-01-03 09:30:00.004712 | 45.91    | 46.12    | 0         | 4680         | 
| AA       | 2017-01-03 09:30:00.004014 | 28.55    | 28.57    | 0         | 4680         | 

Upvotes: 1

Views: 1251

Answers (1)

Jinhua Wang
Jinhua Wang

Reputation: 1759

Thanks to @min2bro I think I know the answer.

The problem is with df['TIME_M'].dt.date , which is an object with date and void time 2017-01-03 00:00:00 . However, grouping by this object returns error because somehow pandas doesn't recognize the date object correctly when parsing columns.

The correct way would be separating the date out alone as a column and grouping by that object instead.

Upvotes: 3

Related Questions