WoodChopper
WoodChopper

Reputation: 4375

Aggregating columns in groupby statement

When using group by in dataframe, can I collect the result of a particular column as list?

I am not sure this detail make sense here but, in PostgreSQL there is a function array_agg(columnname) to achieve the same.

And also I tried finding details in API documentation but not successful in that attempt.

train
Out[6]: 
    TripType  VisitNumber Weekday  ScanCount  DepartmentDescription
1         30            7  Friday          1                  SHOES
2         30            7  Friday          1          PERSONAL CARE
3         26            8  Friday          2  PAINT AND ACCESSORIES
4         26            8  Friday          2  PAINT AND ACCESSORIES
5         26            8  Friday          2  PAINT AND ACCESSORIES
6         26            8  Friday          1  PAINT AND ACCESSORIES
7         26            8  Friday          1  PAINT AND ACCESSORIES
8         26            8  Friday          1  PAINT AND ACCESSORIES
9         26            8  Friday         -1  PAINT AND ACCESSORIES
10        26            8  Friday          1            DSD GROCERY
11        26            8  Friday          2  PAINT AND ACCESSORIES
12        26            8  Friday          1  MEAT - FRESH & FROZEN
13        26            8  Friday          1  PAINT AND ACCESSORIES
14        26            8  Friday         -1  PAINT AND ACCESSORIES
15        26            8  Friday          2  PAINT AND ACCESSORIES
16        26            8  Friday          1  PAINT AND ACCESSORIES
17        26            8  Friday          1  PAINT AND ACCESSORIES
18        26            8  Friday          2                  DAIRY
19        26            8  Friday          1      PETS AND SUPPLIES

train.groupby(['VisitNumber','Weekday','TripType']).count()
Out[7]: 
                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                      2
8           Friday  26               17                     17

I mean result like below for first grouped row,

                              ScanCount  DepartmentDescription
VisitNumber Weekday TripType                                  
7           Friday  30                2                     [SHOES,PERSONAL CARE]

Data set:

{'DepartmentDescription': {1: 'SHOES',
  2: 'PERSONAL CARE',
  3: 'PAINT AND ACCESSORIES',
  4: 'PAINT AND ACCESSORIES',
  5: 'PAINT AND ACCESSORIES'},
 'ScanCount': {1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
 'TripType': {1: 30, 2: 30, 3: 26, 4: 26, 5: 26},
 'VisitNumber': {1: 7, 2: 7, 3: 8, 4: 8, 5: 8},
 'Weekday': {1: 'Friday', 2: 'Friday', 3: 'Friday', 4: 'Friday', 5: 'Friday'}}

Upvotes: 2

Views: 94

Answers (1)

EdChum
EdChum

Reputation: 394459

IIUC you want the following:

In [248]:
df.groupby(['VisitNumber','Weekday','TripType'])['DepartmentDescription'].apply(list)

Out[248]:
VisitNumber  Weekday  TripType
7            Friday   30                                     [SHOES, PERSONAL CARE]
8            Friday   26          [PAINT AND ACCESSORIES, PAINT AND ACCESSORIES,...
Name: DepartmentDescription, dtype: object

Upvotes: 1

Related Questions