FrancescoLS
FrancescoLS

Reputation: 366

apply function to all columns in pandas groupby dataframe

I have the following dataframe (let's call it data):

id | type | val1 | val2 |
-------------------------
1  |  A   | 10.1 | 11.0 |
1  |  B   | 10.5 | 11.2 |
2  |  A   | 10.7 | 10.9 |
2  |  B   | 10.6 | 11.1 |
3  |  A   | 10.3 | 10.5 |
3  |  B   | 10.4 | 11.3 |

and I want to obtain the difference between the types A and B (A-B) for each id for each valX column, i.e. I want the result to be:

id | val1 | val2 |
------------------
1  | -0.4 | -0.2 |
2  |  0.1 | -0.2 |
3  | -0.1 | -0.8 |

The only way I could get this done was to define a function:

def getDelta(df, valName):
    return df[ df['type']=='A' ][valName].values[0] - df[ df['type']=='B' ][valName].values[0]

and apply it for each column separately:

data.groupby('id').apply(getDelta,valName='val1')

and then merge the results to obtain what I was looking for.

Is there a more efficient way to do it? In the end I want to apply a function to a subset of columns of the grouped dataframe, but this function has to take into account the values of another column.

Upvotes: 2

Views: 1410

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

As it currently is you can use np.subtract.reduce, assuming 'A' comes before 'B' in all cases, and there are no duplicates:

df.groupby("id", sort = False).agg(np.subtract.reduce).reset_index()

   id  val1  val2
0   1  -0.4  -0.2
1   2   0.1  -0.2
2   3  -0.1  -0.8

Upvotes: 2

sophocles
sophocles

Reputation: 13821

You can groupby() your ID column and use diff(-1) on your valX columns. Wrapping the operation in concat(), will you give you your desired outcome.

df.set_index('id',inplace=True)
pd.concat([df.groupby(['id'])[df.filter(like='val').columns.tolist()].diff(-1).dropna()]).reset_index()

   id  val1  val2  val3
0   1  -0.4  -0.2  -3.1
1   2   0.1  -0.2  17.0
2   3  -0.1  -0.8   1.5

I have added an extra valX just for illustration purposes.

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can pivot the dataframe:

x = df.pivot(index="id", columns="type", values="val1")
y = df.pivot(index="id", columns="type", values="val2")

df = pd.concat([x["A"] - x["B"], y["A"] - y["B"]], axis=1).rename(
    columns={0: "val1", 1: "val2"}
)
print(df)

Prints:

    val1  val2
id            
1   -0.4  -0.2
2    0.1  -0.2
3   -0.1  -0.8

Upvotes: 1

Related Questions