Martin Noah
Martin Noah

Reputation: 167

After using groupby, how do I get all values from multiple rows into a list?

I have a DataFrame with names of people, dates, start/end times, and durations. I want to group by name and date, sum the Duration, and also "sum" the Start and End values by throwing them into a list.

df = pd.DataFrame([
    ['Bar', '2/18/2019', '7AM', '9AM',120],
    ['Bar', '2/18/2019', '9AM', '11AM',120],
    ['Foo', '2/18/2019', '10AM', '12PM',120],
    ],
    columns=['Name', 'Date', 'Start','End','Duration'])

Looking to turn this...

Into this...

Where I am using groupby to get the sum of Duration for Name and Date...

df.groupby(['Name','Date'])['Duration'].sum().reset_index()

...but having a heck of a time trying to figure out how to throw all of those times into a list. I've tried .apply and building a dictionary where the key is Name+date and the value is the list, but to no avail.

Any hints or gentle nudges in the right direction?

Upvotes: 2

Views: 152

Answers (4)

braintho
braintho

Reputation: 401

When using more than one aggregation function, I opt to use the pandas.pivot_table. To me it is easier to read and understand what is happening

df['Times'] = df.Start +'-'+ df.End
pd.pivot_table(df,
               values = ['Duration','Times'],
               index = ['Name','Date'],
               aggfunc = {'Duration':np.sum,
                          'Times': list}
              )

output:

                Duration                Times
Name Date                                    
Bar  2/18/2019       240  [7AM-9AM, 9AM-11AM]
Foo  2/18/2019       120          [10AM-12PM]

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

One-liner:

df.assign(Times=df['Start']+'-'+df['End'])\
  .groupby(['Name','Date'], as_index=False)\
  .agg({'Duration':'sum','Times':list})

Output:

  Name       Date  Duration                Times
0  Bar  2/18/2019       240  [7AM-9AM, 9AM-11AM]
1  Foo  2/18/2019       120          [10AM-12PM]

Upvotes: 2

Code Different
Code Different

Reputation: 93181

Try this:

df['Time'] = df['Start'] + '-' + df['End']

df.groupby(['Name', 'Date']).apply(lambda x: pd.Series({
    'Duration': x['Duration'].sum(),
    'Times': x['Time'].values
}))

Times now contains ndarrays of strings.

Upvotes: 4

BENY
BENY

Reputation: 323326

Here is one way

df['Time']=df.Start+'-'+df.End
df.groupby(['Name','Date']).agg({'Time':'unique','Duration':'sum'})
Out[242]: 
                               Time  Duration
Name Date                                    
Bar  2/18/2019  [7AM-9AM, 9AM-11AM]       240
Foo  2/18/2019          [10AM-12PM]       120

Upvotes: 3

Related Questions