Todd Houstein
Todd Houstein

Reputation: 89

How to bar plot grouped by two variables

I have 15-minute timestep data of a quantity for serveral years...

Datetime Quantity
01/07/2018 00:15 6.96
01/07/2018 00:30 6.48
01/07/2018 00:45 6.96
01/07/2018 01:00 6.72
. .
. .

I am using Pandas. How do I produce a bar plot with months on the horizontal axis; and a series (set of bars) for each year; with the height of each bar being the total quantity for that month & year.

Exactly like this:

example plot

Upvotes: 2

Views: 3066

Answers (2)

Zephyr
Zephyr

Reputation: 12496

Fake dataframe creation:

df = pd.DataFrame()
df['Datetime'] = pd.date_range(start = '01/07/2018', end = '13/08/2021', freq = '15min')
df['Quantity'] = np.random.rand(len(df))

Starting from this point, you should extract month and year and save them in separate columns:

df['month'] = df['Datetime'].dt.month
df['year'] = df['Datetime'].dt.year

Then you have to compute the sum of 'Quantity' by month for each year:

df = df.groupby(by = ['month', 'year'])['Quantity'].sum().reset_index()

After this passage, you should have a dataframe like this:

             Datetime  Quantity  month  year
0 2018-01-07 00:00:00  0.226113      1  2018
1 2018-01-07 00:15:00  0.222872      1  2018
2 2018-01-07 00:30:00  0.835484      1  2018
3 2018-01-07 00:45:00  0.775771      1  2018
4 2018-01-07 01:00:00  0.972559      1  2018
5 2018-01-07 01:15:00  0.418036      1  2018
6 2018-01-07 01:30:00  0.902843      1  2018
7 2018-01-07 01:45:00  0.012441      1  2018
8 2018-01-07 02:00:00  0.883437      1  2018
9 2018-01-07 02:15:00  0.183561      1  2018

Now the dataframe is ready to be plotted; using seaborn:

fig, ax = plt.subplots()

sns.barplot(ax = ax, data = df, x = 'month', y = 'Quantity', hue = 'year')

plt.show()

Complete Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.DataFrame()
df['Datetime'] = pd.date_range(start = '01/07/2018', end = '13/08/2021', freq = '15min')
df['Quantity'] = np.random.rand(len(df))
df['month'] = df['Datetime'].dt.month
df['year'] = df['Datetime'].dt.year

df = df.groupby(by = ['month', 'year'])['Quantity'].sum().reset_index()


fig, ax = plt.subplots()

sns.barplot(ax = ax, data = df, x = 'month', y = 'Quantity', hue = 'year')

plt.show()

enter image description here

Upvotes: 5

JBVasc
JBVasc

Reputation: 63

Perhaps you can extract months and years into new columns and make multiple subplots with months in the x axis, one for each year, and combine them all at the end in a unique plot. Take a look at the example below, and notice the width parameter and the displacement by the same value in plt.bar, so that plots don't cover each other.

import pandas as pd
import matplotlib.pyplot as plt
import datetime

# create df
d1 = datetime.date(2018, 8, 30)
d2 = datetime.date(2018, 9, 30)
d3 = datetime.date(2019, 8, 30)
d4 = datetime.date(2019, 9, 30)

df = pd.DataFrame({
    'date': [d1, d1, d2, d2, d3, d3, d4, d4],
    'values':[10, 20, 40, 40, 50, 55, 65, 70]})

df['month'] = df.date.apply(lambda x: x.month)
df['year'] = df.date.apply(lambda x: x.year)

# make plots
fig, ax = plt.subplots()
ax = plt.bar(df[df.year == 2018].groupby(['month']).sum()['values'].index, df[df.year == 2018].groupby(['month']).sum()['values'])
ax = plt.bar(df[df.year == 2019].groupby(['month']).sum()['values'].index, df[df.year == 2019].groupby(['month']).sum()['values'])
plt.show()

Maybe creating new columns as I did won't be very efficient for you if you have a very large dataframe. To make the plots, I filtered rows by year in each line, grouped them by month and used the sum of values. The indexes are tuples (year, month).

Upvotes: 1

Related Questions