Maverick
Maverick

Reputation: 799

Python/Pandas aggregating by date

I am trying to count and plot the number of data points I have for each area by day, so far I have: enter image description here

But I would like to show the number of instances of each county per day, with the end goal of plotting them on a line graph, like:

enter image description here

Only I would want to plot each county on its own line, rather than the total which I have plotted above.

Update:

I have managed to get this from the answers provided:

enter image description here

Which is great and exactly what I was looking for. However, in hindsight, this looks a little messy and not very descriptive even for the short period plotted let alone if I were to plot this for a couple of years worth of data.

So I'm thinking to plot this indivually on an 8 grid plot. But when I try to plot this for one county I am getting the boolean values. As below:

enter image description here

What would be the best way to plot only the True values?

Upvotes: 2

Views: 81

Answers (2)

Ami Tavory
Ami Tavory

Reputation: 76297

You can try

df.county.groupby([df.date_stamp, df.county]).count().unstack().plot();
  • df.county...count() is the numerical series you want to plot.
  • groupby([df.date_stamp, df.county]) groups first by date_stamp, then by country (the order matters).
  • unstack will create a Dataframe whose index is the time stamp, and columns are counties.
  • plot(); will plot it (and the ; suppresses the unnecessary output).

Edit

To plot it on separate plots, you could do something like

for county in df.county.unique():
    this_county = df[df.county == county]
    this_county.county.groupby(df.date_stamp).count().plot();
    title(county);
    show();

Upvotes: 3

tozCSS
tozCSS

Reputation: 6114

pd.crosstab(df['date_stamp'],df['county']).plot()

EDIT: question changed, if you want them in subplots instead of lines:

pd.crosstab(df['date_stamp'],df['county']).plot(subplots=True)

The key in drawing each county as a separate line is that each county needs to be in a different column. If you just want to count them, crosstab is then probably the shortest way to achieve that result. For example:

enter image description here

Then the result is: pd.crosstab(df['date_stamp'],df['county']).plot()

When subplots=True:

pd.crosstab(df['date_stamp'],df['county']).plot(subplots=True)

Upvotes: 1

Related Questions