SooWoo
SooWoo

Reputation: 71

Plot a column with labels of another column over time in Pandas

So I have a table in Pandas dataframe (python) where I want to plot one column with labels from another column over a time column.

For example:

fruit | fruit_count | datestamp 
apple    20           03-2018
kiwi     10           03-2018
mango    35           03-2018
apple    16           04-2018
kiwi     18           04-2018
mango    40           04-2018
.        .              .
.        .              .
apple    50           03-2020
kiwi     70           03-2020
mango    120          03-2020

Basically it would be one plot where the x-axis is the datestamp (03-2018, 04-2018, ..., 03-2020) and there would be 3 line plots - one for apple, kiwi, and mango with 3 corresponding labels.

Currently, I try to do it by just parsing the unique fruit names from the dataframe

fruits = list(set(fruit_df['fruit'].tolist())) and then I loop through and plot each one

for fruit in fruits:
    fruit_df[fruit_df['fruit'] == fruit].plot(x='datestamp', y='fruit_count')

Is there a better way to do this which would do this all in one line and would plot everything on one graph instead of 3 different ones.

Upvotes: 2

Views: 6796

Answers (2)

Sajan
Sajan

Reputation: 1267

In case, the combination of datestamp and fruit are non-unique:

fruit_df.groupby(['datestamp', 'fruit'])['fruit_count'].sum().unstack().plot(kind='bar')

In case, the combination is unique, this should also work:

df.pivot(index='datestamp', columns='fruit', values='fruit_count').plot(kind='bar')

Upvotes: 0

ALollz
ALollz

Reputation: 59569

You have a few options. If you really want a one-line solution you'll want seaborn, or to reshape your data using pivot

Sample Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

N = 20
df = pd.DataFrame({'fruit': ['apple', 'kiwi', 'mango']*N,
                   'date_stamp': np.repeat(pd.date_range('2010-01-01', freq='1M',  periods=N), 3),
                   'fruit_count': np.random.randint(1,100, N*3)})

Seaborn

You use hue to specify the groups.

sns.lineplot(data=df, hue='fruit', x='date_stamp', y='fruit_count')

pandas.DataFrame.groupby

Similar to your current implementation, but you can use groupby to split into the sub-Frames.

fig, ax = plt.subplots()
for fruit, gp in df.groupby('fruit'):
    gp.plot(x='date_stamp', y='fruit_count', ax=ax, label=fruit)

pandas.pivot

Pivot before plotting, then you just need a single plot call

df.pivot(index='date_stamp', columns='fruit', values='fruit_count').plot()

Output*

Axes and labeling slightly different between methods. This is the groupby output. enter image description here

Upvotes: 2

Related Questions