piggyback
piggyback

Reputation: 9254

Plotting pandas time and category

I really can't get out of this. Here is my table:

where grade can be A,B,C

doc_id, grade, timestamp
1, A, 27/01/15
2, A, 27/01/15
3, B, 27/01/15
...

My aim is to show a graph with three lines, showing how many A, B and C I got through time.

I can only think of this:

docs[docs['grade']== 'A'].groupby('time').count()
docs[docs['grade']== 'B'].groupby('time').count()
docs[docs['grade']== 'C'].groupby('time').count()

and combine them some how, but it is already clear I am on the wrong way,

any hint?

Upvotes: 0

Views: 65

Answers (2)

Abinash Panda
Abinash Panda

Reputation: 464

Let the input_data be

       grade timestamp
doc_id                
1          A  27/01/15
2          A  27/01/15
3          B  27/01/15
4          C  27/01/15
5          A  27/01/16
6          A  27/01/16
7          A  27/01/16
8          B  27/01/16
9          B  27/01/16
10         C  27/01/16
11         A  27/01/16
12         B  27/01/16
13         C  27/01/16
14         C  27/01/16

So to show a graph with three lines, showing how many A, B and C you got through time, you can use

result = input_data.groupby(['timestamp']).apply(lambda x: x.grade.value_counts())

The output would be something like this

           A  B  C
timestamp         
27/01/15   2  1  1
27/01/16   4  3  3

You can plot the data using result.plot().

Upvotes: 1

Alexander
Alexander

Reputation: 109520

try this:

df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().cumsum().ffill().fillna(0)

It basically pivots by date and grade, rolling forward the cumulative sum.

>>> df2
grade       A  B   C
timestamp           
4/1/15      0  1   0
4/11/15     4  1   2
4/3/15      4  4   2
4/4/15      4  5   3
4/5/15      4  6   3
4/6/15      7  6   6
4/8/15      9  6   8
4/9/15     13  7  12

enter image description here

If you don't want a cumulative sum, you can just use:

df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().fillna(0)

Upvotes: 1

Related Questions