Pandas Time Series and groupby

Question

[Edited to more clearly state root problem, which behaves differently if you use numpy 1.8 as dmvianna points out]

I have a DataFrame that has time stamps add other data. In the end I would like to not use a formatted time as the index because it messes with matplotlibs 3d plotting. I also want to preform a groupby to populate some flag fields. This is causing me to run into a number of weird errors. The first two work as I would expect. Once I bring pd.to_datetime into the picture it starts throwing errors.

runs as expected:

import pandas as pd
import numpy as np

df = pd.DataFrame({'time':np.random.randint(100000, size=1000),
                    'type':np.random.randint(10, size=1000), 
                    'value':np.random.rand(1000)})

df['high'] = 0

def high_low(group):
    if group.value.mean() > .5:
        group.high = 1
    return group

grouped = df.groupby('type')
df = grouped.apply(high_low)

works fine:

df = pd.DataFrame({'time':np.random.randint(100000, size=1000),
                    'type':np.random.randint(10, size=1000), 
                    'value':np.random.rand(1000)})

df.time = pd.to_datetime(df.time, unit='s')

df['high'] = 0

def high_low(group):
    if group.value.mean() > .5:
        group.high = 1
    return group

grouped = df.groupby('type')
df = grouped.apply(high_low)

throws error: ValueError: Shape of passed values is (3, 1016), indices imply (3, 1000)

df = pd.DataFrame({'time':np.random.randint(100000, size=1000),
                    'type':np.random.randint(10, size=1000), 
                    'value':np.random.rand(1000)})

df.time = pd.to_datetime(df.time, unit='s')
df = df.set_index('time')

df['high'] = 0

def high_low(group):
    if group.value.mean() > .5:
        group.high = 1
    return group

grouped = df.groupby('type')
df = grouped.apply(high_low)

throws error: ValueError: Shape of passed values is (3, 1016), indices imply (3, 1000)

df = pd.DataFrame({'time':np.random.randint(100000, size=1000),
                    'type':np.random.randint(10, size=1000), 
                    'value':np.random.rand(1000)})

df['epoch'] = df.time
df.time = pd.to_datetime(df.time, unit='s')
df = df.set_index('time')
df = df.set_index('epoch')

df['high'] = 0

def high_low(group):
    if group.value.mean() > .5:
        group.high = 1
    return group

grouped = df.groupby('type')
df = grouped.apply(high_low)

Anyone know what I'm missing / doing wrong?

Pandas Time Series and groupby

Answers (1)

Related Questions