Working with Pandas Groupby and multiple rows

Question

I've searched everywhere and tried everything I could but can't quite get what I want from my data.

Background:

I have a set of data that has been derived from invoice data. I've massaged that data to get to the point where I have a pandas dataframe consisting of six columns. These columns (and sample data is below):

Project_ID - ID for the project
Project_Type - type of matter of the work performed in a project.
Create Date - Creation date of the project (when the project was initiated)
Invoice Dates - Dates that invoices were generated for the
Project Age - the age of each invoice (calculated from project initiation date)

Data sample can be found in this CSV file.

Each project can have multiple invoices, which is what is causing my issue.

What I want to do:

Aggregate by Project Type and get the min, max, mean and std of "Age" for each of project type. I thought this would be a simple groupby using the Project_Type column but I can't get the min, max, mean, std functions to work as applied to that groupby.

I'm sure this is a simple issue but nothing I've found has solved it for me.

Any help or pointers appreciated.

Data sample:

Project_ID  Project_Type    Create_Date     Invoice_Dates   Age
25098       Computers       1/11/12 0:00    2/6/12 0:00     26 days 
25098       Computers       1/11/12 0:00    2/29/12 0:00    49 days 
25113       Telecom         1/12/12 0:00    4/30/12 0:00    109 days 
25113       Telecom         1/12/12 0:00    6/30/12 0:00    170 days

Bob Haffner · Accepted Answer

Eric, I didn't download your file, but I took a swing at it. I would post the first few lines in your question so we don't have to download it.

Yes, groupby() would be a good way to go. You can specify the agg functions in a list like this

df[['Project_Type','Project Age']].groupby('Project_Type').agg(['min',
                                                            'max',
                                                            'mean',
                                                            'std'])

Working with Pandas Groupby and multiple rows

Answers (1)

Related Questions