Christopher Turnbull
Christopher Turnbull

Reputation: 1005

Dataframe Dealing With Missing Time Series Data

My DataFrame is an array time series taken every minute over a period of ~60 days.

  1. First I want to segment the df into 24 hour periods.

  2. Then I want to plot certain attributes as a waterfall chart, line graphs on top of each other.

I'm thinking of using iloc in a for loop to do this as the df rows are indexed by time, meaning there are 3600 rows per day. The problem is that I don't know how to assign each to a variable.

for i in range(58)
     df = timethingdf.iloc[809+i*3600:809+(i+1)*3600]

As you can see, I would like df to be different for each of the 58 dfs I am making with this.

And I have no idea on how to do the chart.

Upvotes: 2

Views: 115

Answers (2)

Jeff
Jeff

Reputation: 2228

I think what you want is TimeGrouper:

data = {'date':['2004-1-2:10:10:00', '2004-1-2:10:11:00', '2004-1-1:11:11:00', '2004-1-1:11:13:00'], 'foo':[5,6,7,8]}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d:%H:%M:%S')
df = df.set_index('date')
grouped = df.groupby(pd.TimeGrouper('24H')).sum()

In [7]: grouped
Out[8]:
            foo
date
2004-01-01   15
2004-01-02   11

You can then replace .sum() with whatever aggregator you want to use on the grouped subsets.

Upvotes: 0

falsetru
falsetru

Reputation: 368924

I think you should've meant this:

for i in range(58)
    df = timethingdf.iloc[809+i*3600:809+(i+1)*3600]
    # Doing something with `df`

Upvotes: 1

Related Questions