Maurice Stam
Maurice Stam

Reputation: 79

How to plot histogram of multiple lists?

I have a dataset with 13k Kickstarter projects and their tweets over the duration of a project. Each project contains a list with the number of tweets for each day, e.g. [10, 2, 4, 7, 2, 4, 3, 0, 4, 0, 1, 3, 0, 3, 4, 0, 0, 2, 3, 2, 0, 4, 5, 1, 0, 2, 0, 2, 1, 2, 0].

I've taken a subset of the data by setting the duration of the projects on 31 days so that each list has the same length, containing 31 values.

This piece of code prints each list of tweets:

    for project in data:
        data[project]["tweets"] 

What is the easiest way to plot a histogram with matplotlib? I need a frequency distribution of the total number of tweets for each day. How do I count the values from each index? Is their an easy way using Pandas to do this?

The lists are also accessible in a Pandas data frame:

    df = pd.DataFrame.from_dict(data, orient='index')
    df1 = df[['tweets']]

Upvotes: 2

Views: 4308

Answers (1)

Ilya Peterov
Ilya Peterov

Reputation: 2065

Histogram is probably not what you need. It's a good solution if you have a list of numbers (for example, IQs of people) and you want to attribute each number to a category (f.e. 79-, 80-99, 100+). There will be 3 bins and height of each bin will represent the quantity of numbers that fit in the corresponding category.

In your case, you already have the height of each bin, so (as I understand) what you want is a plot that looks like like a histogram. This (as I understand) is not supported by matplotlib and would require using matplotlib not the way it was intended to be used.

If you're OK with using plots instead of histograms, that's what you can do.

import matplotlib.pyplot as plt

lists = [data[project]["tweets"] for project in data] # Collect all lists into one
sum_list = [sum(x) for x in zip(*lists)] # Create a list with sums of tweets for each day

plt.plot(sum_list) # Create a plot for sum_list
plt.show() # Show the plot

If you want to make a plot look like a histogram you should do that:

plt.bar(range(0, len(sum_list)), sum_list)

instead of plt.plot.

Upvotes: 3

Related Questions