Reputation: 393

Matplotlib line chart with count?

I am trying to plot a line chart of a large data set where I want to set the y to a"count"-value.

This is a mock df :

my = pd.DataFrame(np.array(
   [['Apple', 1], 
    ['Kiwi',  2],
    ['Clementine', 3],
    ['Kiwi', 1], 
    ['Banana',  2], 
    ['Clementine', 3],
    ['Apple',  1], 
    ['Kiwi',  2]]), 
                    columns=['fruit', 'cheers'])

I would like the plot to use the 'cheers' as the x and then have one line for each 'fruit' and the number of times 'cheers'

EDIT: Line graph might not be the best pursuit, please do advise me then. I would like something like this:

In the big data set there would maybe one but not several "zeros", maybe I should've made a bigger mock df.

Upvotes: 1

Answers (4)

Sheldore

Reputation: 39062

An alternate way to get exactly the figure you posted which starts the curves from 0 is following. The idea is to count the frequency of occurrence of each fruit for different cheers and then make use of dictionaries.

from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Define the dataframe here
# my = pd.DataFrame(...)

cheers = np.array(my['cheers'])

for fr in np.unique(my['fruit']):
    freqs = Counter(cheers[np.argwhere(my['fruit']==fr)].flatten()) # Count the frequency
    init_dict = {'0': 0}
    init_dict.update({i: 0 for i in np.unique(cheers)}) # Initialize the dictionary with 0 values
    for k, v in freqs.items():
        init_dict[k] = v # Update the values of cheers
    plt.plot(init_dict.keys(), init_dict.values(), '-o', label=fr) # Plot each fruit line

plt.legend()
plt.yticks(range(4))
plt.show()

Upvotes: 1

onion_man

Reputation: 36

The code below will plot a line for each 'fruit' where the x coordinate is the number of 'cheers' and the y coordinate is the cheers counts per fruit.

First, the dataframe is grouped by fruit to get the list of cheers per fruit. Next, a histogram is computed and plotted for each list of cheers. The max_cheers_count is used in order to ensure the same x coordinates for all plotted lines.

Note: see @Heike's answer below for a more pythonic solution.

import matplotlib.pyplot as plt
import numpy as np

# convert 'cheers' column to int
my.cheers = my['cheers'].astype(int)

# computes maximal cheers value, to use later for the histogram
max_cheers_count = my['cheers'].max()

# get cheer counts per fruit
cheer_counts = my.groupby('fruit').apply(lambda x: x['cheers'].values)

# for each fruit compute histogram of cheer counts and plot it
plt.figure()
for row in cheer_counts.iteritems():
    histogram = np.histogram(a=row[1], bins=range(1,max_cheers_count+2))
    plt.plot(histogram[1][:-1], histogram[0], marker='o', label=row[0])
plt.xlabel('cheers')
plt.ylabel('counts')
plt.legend()

Upvotes: 1

Heike

Reputation: 24420

I see you already accepted an answer, but an alternative way to do this is something like

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

my = pd.DataFrame(np.array([['Apple', 1],
                            ['Kiwi',  2],
                            ['Clementine', 3],
                            ['Kiwi', 1],
                            ['Banana',  2],
                            ['Clementine', 3],
                            ['Apple',  1],
                            ['Kiwi',  2]]),
                  columns=['fruit', 'cheers'])

my_pivot = my.pivot_table(index = 'cheers', 
                          columns = 'fruit', 
                          fill_value = 0, 
                          aggfunc={'fruit':len})['fruit']
my_pivot.plot.line()
plt.tight_layout()
plt.show()

Output:

Upvotes: 2

SpghttCd

Reputation: 10880

my.groupby('fruit').sum().plot.barh()

Note that your example dataframe appears to have the numbers represented as string type, so you might change that to int before with

my.cheers = my.cheers.astype(int)

Afaics this because of your initialization of the dataframe via a 2D-array.
You can avoid this by using the dictionary approach to create a dataframe:

my = pd.DataFrame(
{'fruit': ['Apple', 'Kiwi', 'Clementine', 'Kiwi', 'Banana', 'Clementine', 'Apple', 'Kiwi'],
'cheers': [1, 2, 3, 1, 2, 3, 1, 2]})

Upvotes: 1

Matplotlib line chart with count?

Answers (4)

Related Questions