Reputation: 2273
I have a CSV with data like
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 18:14:58,57,4
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 20:11:15,1884,90
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-10-04 09:44:21,1146,6
4be390eefaf9a64e7cb52937c4a5c77a,"avito.ru",2014-09-29 21:01:29,48,3
I sort this like
print(infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum())
And I got data:
address used_at
am.ru 2014 413071
2015 183402
auto.ru 2014 9122342
2015 6923367
avito.ru 2014 84503151
2015 87688571
avtomarket.ru 2014 106849
2015 95927
cars.mail.ru/sale 2014 211456
2015 167278
drom.ru 2014 11014955
2015 9704124
e1.ru 2014 28678357
2015 27961857
irr.ru/cars 2014 222193
2015 133678
I need to create bar chart like this example
But insted men and women I need to 2014 and 2015 year to every web-site(at axis x) and sum of active_seconds
(at axis y).
In example they use np.array, but I have object type series.
I try do this with:
width = 0.35
plt.figure()
ax = graph_by_duration['address'].plot(kind='bar', secondary_y=['active_seconds'])
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()
Should I convert it to np.array or process to do this?
Upvotes: 1
Views: 3314
Reputation: 863421
I think you can first add reset_index
and then pivot
DataFrame
for creating columns 2014
and 2015
. Last use plot.bar
:
df = infile.groupby(['address', infile['used_at'].dt.year]).active_seconds.sum()
.reset_index()
print df
address used_at active_seconds
0 am.ru 2014 413071
1 am.ru 2015 183402
2 auto.ru 2014 9122342
3 auto.ru 2015 6923367
4 avito.ru 2014 84503151
5 avito.ru 2015 87688571
6 avtomarket.ru 2014 106849
7 avtomarket.ru 2015 95927
8 cars.mail.ru/sale 2014 211456
9 cars.mail.ru/sale 2015 167278
10 drom.ru 2014 11014955
11 drom.ru 2015 9704124
12 e1.ru 2014 28678357
13 e1.ru 2015 27961857
14 irr.ru/cars 2014 222193
15 irr.ru/cars 2015 133678
graph_by_duration = df.pivot(index='address', columns='used_at', values='active_seconds')
print graph_by_duration
used_at 2014 2015
address
am.ru 413071 183402
auto.ru 9122342 6923367
avito.ru 84503151 87688571
avtomarket.ru 106849 95927
cars.mail.ru/sale 211456 167278
drom.ru 11014955 9704124
e1.ru 28678357 27961857
irr.ru/cars 222193 133678
ax = graph_by_duration.plot.bar(figsize=(10,8))
ax.set_ylabel('Time online')
ax.set_title('Time spent online per web site, per year')
plt.show()
Upvotes: 3