Reputation: 515
I have a pandas dataframe with laboratory tests, and for each sample, I have a few tests. For example, in this synthetic data I have 3 sample each one with 3 test pieces.
I need to get the value of a column, based on the maximum value of another column for each test, and all samples. So at the end, I would have a pandas dataframe with 9 rows.
The synthetic data is:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
name = np.array(['PI-01','PI-02','PI-03'])
name = np.repeat(name,30)
ID = np.array(['CP-A','CP-B','CP-C'])
ID = np.repeat(ID,10)
ID = np.tile(ID,3)
sig_3 = np.array([100,200,300])
sig_3 = np.repeat(sig_3,10)
sig_3 = np.tile(sig_3,3)
np.random.seed(42)
def_ax1 = np.sort(np.random.uniform(0.3,15.,10))
def_ax2 = np.sort(np.random.uniform(0.3,15.,10))
def_ax3 = np.sort(np.random.uniform(0.3,15.,10))
def_ax = np.concatenate((def_ax1,def_ax2,def_ax3),axis=0)
def_ax = np.tile(def_ax,3)
q = np.linspace(100,500,90)
p = np.linspace(100,800,90)
du = np.random.uniform(-20,500,90)
tx = pd.DataFrame({'Name':name, 'ID':ID,'sig_3':sig_3,'def_ax':def_ax,'q':q,'p':p,'du':du})
and the code I'm trying is:
max_du = tx.loc[tx.groupby('Name')['du'].idxmax()]
But gives the maximum for all group name, and not for the subgroup ID. I guess I need to group by Name, and for ID, and then find the max of du, how could I group like this? the result I would like is:
PI01 - max du for CP-01
max du for CP-02
max du for CP-03
PI02 - max du for CP-01
max du for CP-02
max du for CP-03
PI03 - max du for CP-01
max du for CP-02
max du for CP-03
Upvotes: 1
Views: 2467
Reputation: 311
First of all, well done for postig sample data and expected output to this question!!
With groupby you don't need to use tx.loc
, here your answer:
tx.groupby(['Name','ID'])['du'].max()
groupby:
.max()
- called methodafter calling the columns you need a method (since x values must be compressed in the cell. ex: .unique()
, .sizem()
, .min()
,.mean()
,.max()
, etc...
Alternative:
I like to use .agg
, since it allows me to use functions (numpy, lambda, etc.) instead of methods and I have a better overview with it.
max_du=tx.groupby(['Name','ID']).agg({'du':np.max})
Here an example with more features:
tx.groupby(['Name','ID']).agg({'du':(np.max,np.size,np.mean), 'q': lambda x: list(pd.unique(x))})
.agg
allows us to get more statistics from column 'du' and a list of all 'q' values
tx.loc[tx.groupby(['Name','ID'])['du'].idxmax()]
returns a part of the dataframe. Exactly where du
hat its max (for the above groups).
Upvotes: 3