Aggregating and custom function pandas

Question

I have a dataframe like the following:

Label	Indicator	Value1	Value2
A	77	50	50
A	776	60	70
A	771	70	40
A	7	80	50
A	7775	90	40
B	776	100	40
B	771	41	50
B	775	54	40
B	7775	55	50

What I want is an output like that:

Label	aggregation1	aggregation2
A	aggregation1_A_value	aggregation2_A_value
B	aggregation1_B_value	aggregation2_B_value

Knowing that the way I want to aggregate value is the following (example):

aggregation1 = value1 of indicator starting with 77 (but not 776) - value2 of indicator 776 and 775.

What I am doing now is the following: I split the Indicator into several columns, to have a new data frame:

Label	Indicator0	Indicator1	Indicator2	...
A	7	77	77	...
A	7	77	776	...
A	7	77	771	...
...	...	...	...	...
B	7	77	777	...

aggregation1_A = df.query("Label=='A' and Indicator1 is in ["77"] and Indicator2 is not in ["776"]")["value1"].sum()
aggregation1_A -= df.query("Label=='A' and Indicator2 is in ["776","775"]")["value2"].sum()

My issue is that I have more than 70 000 differents labels, and about 20 aggregations to run. Dataframe is 500MB large.

I am wondering if there is any better way. I had a look with pandas UDF and apply a custom aggregation function but I didn't succeed so far.

Thank you for your help

Aggregating and custom function pandas

Answers (1)

Related Questions