Julian
Julian

Reputation: 483

pandas: calculate the daily average, grouped by label

I want to create a graph with lines represented by my label

so in this example picture, each line represents a distinct label

enter image description here

The data looks something like this where the x-axis is the datetime and the y-axis is the count.

datetime, count, label
1656140642, 12, A
1656140643, 20, B
1656140645, 11, A
1656140676, 1, B

Because I have a lot of data, I want to aggregate it by 1 hour or even 1 day chunks.

I'm able to generate the above picture with

# df is dataframe here, result from pandas.read_csv
df.set_index("datetime").groupby("label")["count"].plot

and I can get a time-range average with

df.set_index("datetime").groupby(pd.Grouper(freq='2min')).mean().plot()

but I'm unable to get both rules applied. Can someone point me in the right direction?

Upvotes: 1

Views: 99

Answers (1)

Roim
Roim

Reputation: 3066

You can use .pivot (documentation) function to create a convenient structure where datetime is index and the different labels are the columns, with count as values.

df.set_index('datetime').pivot(columns='label', values='count')

output:

label          A    B
datetime        
1656140642  12.0    NaN
1656140643  NaN     20.0
1656140645  11.0    NaN
1656140676  NaN     1.0

Now when you have your data in this format, you can perform simple aggregation over the index (with groupby / resample/ whatever suits you) so it will be applied each column separately. Then plotting the results is just plotting different line for each column.

Upvotes: 2

Related Questions