Eric D. Brown D.Sc.
Eric D. Brown D.Sc.

Reputation: 1956

Working with Pandas Groupby and multiple rows

I've searched everywhere and tried everything I could but can't quite get what I want from my data.

Background:

I have a set of data that has been derived from invoice data. I've massaged that data to get to the point where I have a pandas dataframe consisting of six columns. These columns (and sample data is below):

Data sample can be found in this CSV file.

Each project can have multiple invoices, which is what is causing my issue.

What I want to do:

Aggregate by Project Type and get the min, max, mean and std of "Age" for each of project type. I thought this would be a simple groupby using the Project_Type column but I can't get the min, max, mean, std functions to work as applied to that groupby.

I'm sure this is a simple issue but nothing I've found has solved it for me.

Any help or pointers appreciated.

Data sample:

Project_ID  Project_Type    Create_Date     Invoice_Dates   Age
25098       Computers       1/11/12 0:00    2/6/12 0:00     26 days 
25098       Computers       1/11/12 0:00    2/29/12 0:00    49 days 
25113       Telecom         1/12/12 0:00    4/30/12 0:00    109 days 
25113       Telecom         1/12/12 0:00    6/30/12 0:00    170 days 

Upvotes: 0

Views: 777

Answers (1)

Bob Haffner
Bob Haffner

Reputation: 8483

Eric, I didn't download your file, but I took a swing at it. I would post the first few lines in your question so we don't have to download it.

Yes, groupby() would be a good way to go. You can specify the agg functions in a list like this

df[['Project_Type','Project Age']].groupby('Project_Type').agg(['min',
                                                            'max',
                                                            'mean',
                                                            'std'])

Upvotes: 2

Related Questions