Reputation: 9254
I really can't get out of this. Here is my table:
where grade can be A,B,C
doc_id, grade, timestamp
1, A, 27/01/15
2, A, 27/01/15
3, B, 27/01/15
...
My aim is to show a graph with three lines, showing how many A, B and C I got through time.
I can only think of this:
docs[docs['grade']== 'A'].groupby('time').count()
docs[docs['grade']== 'B'].groupby('time').count()
docs[docs['grade']== 'C'].groupby('time').count()
and combine them some how, but it is already clear I am on the wrong way,
any hint?
Upvotes: 0
Views: 65
Reputation: 464
Let the input_data
be
grade timestamp
doc_id
1 A 27/01/15
2 A 27/01/15
3 B 27/01/15
4 C 27/01/15
5 A 27/01/16
6 A 27/01/16
7 A 27/01/16
8 B 27/01/16
9 B 27/01/16
10 C 27/01/16
11 A 27/01/16
12 B 27/01/16
13 C 27/01/16
14 C 27/01/16
So to show a graph with three lines, showing how many A, B and C you got through time, you can use
result = input_data.groupby(['timestamp']).apply(lambda x: x.grade.value_counts())
The output would be something like this
A B C
timestamp
27/01/15 2 1 1
27/01/16 4 3 3
You can plot the data using result.plot()
.
Upvotes: 1
Reputation: 109520
try this:
df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().cumsum().ffill().fillna(0)
It basically pivots by date and grade, rolling forward the cumulative sum.
>>> df2
grade A B C
timestamp
4/1/15 0 1 0
4/11/15 4 1 2
4/3/15 4 4 2
4/4/15 4 5 3
4/5/15 4 6 3
4/6/15 7 6 6
4/8/15 9 6 8
4/9/15 13 7 12
If you don't want a cumulative sum, you can just use:
df2 = df.groupby(['timestamp', 'grade']).grade.size().unstack().fillna(0)
Upvotes: 1