NineWasps
NineWasps

Reputation: 2253

Groupby ID pandas

I have a data like:

ID,"address","used_at","active_seconds","pageviews"
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 18:14:58,57,4
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-09-30 20:11:15,1884,90
4be390eefaf9a64e7cb52937c4a5c77a,"e1.ru",2014-10-04 09:44:21,1146,6
4be390eefaf9a64e7cb52937c4a5c77a,"avito.ru",2014-09-29 21:01:29,48,3

I have only date with 2014 and 2015 year.

I want to group by address and get the number of users to every year to every website, but I have an error

print(DataFrame({'count' : infile.groupby(["address", infile['used_at'].dt.year].ID.nunique())}))

AttributeError: 'list' object has no attribute 'ID'

I try to rename my column

infile = pd.read_csv("avito_trend.csv", header=None, parse_dates=[2], low_memory=False, names=['user', 'address', 'date', 'duration', 'unknown'] )

But i have no result. Because last name isn't disappear.

How can I do this group?

Upvotes: 1

Views: 1173

Answers (1)

jezrael
jezrael

Reputation: 862761

I think you can add reset_index for creating DataFrame for creating number of unique users (ID) by nunique:

infile = pd.read_csv("test/avito_trend.csv", 
                      parse_dates=[2])


print (infile.groupby(["address",infile['used_at'].dt.year])['ID'].nunique()
                                                                  .reset_index(name='count'))
              address  used_at  count
0               am.ru     2014    621
1               am.ru     2015    273
2             auto.ru     2014   1752
3             auto.ru     2015   1595
4            avito.ru     2014   5460
5            avito.ru     2015   4631
6       avtomarket.ru     2014    314
7       avtomarket.ru     2015    215
8   cars.mail.ru/sale     2014    457
9   cars.mail.ru/sale     2015    271
10            drom.ru     2014   1934
11            drom.ru     2015   1623
12              e1.ru     2014   1654
13              e1.ru     2015   1359
14        irr.ru/cars     2014    619
15        irr.ru/cars     2015    426

Or if you need number of all users use count:

print (infile.groupby(["address",infile['used_at'].dt.year])['ID'].count()
                                                                  .reset_index(name='count'))

              address  used_at   count
0               am.ru     2014    1422
1               am.ru     2015     867
2             auto.ru     2014   14670
3             auto.ru     2015   13237
4            avito.ru     2014  150240
5            avito.ru     2015  145915
6       avtomarket.ru     2014     648
7       avtomarket.ru     2015     681
8   cars.mail.ru/sale     2014    1211
9   cars.mail.ru/sale     2015     726
10            drom.ru     2014   18678
11            drom.ru     2015   16194
12              e1.ru     2014   64426
13              e1.ru     2015   67164
14        irr.ru/cars     2014    1472
15        irr.ru/cars     2015     878

Upvotes: 1

Related Questions