Reputation: 1163
Apologies if the question isn't totally clear. However, I do have some example code showing the desired input and output (see below).
I have a (large) pandas dataframe and want to select the minimum values in pval1 and the corresponding lag. I also want to select the minimum value in pval2 and the corresponding lag. I would like to do this for each pair of variables (i.e. (A and B), (A and C) and (B and D)). Each pair of variables occurs multiple times in the dataset.
I've tried several approaches to try and get the output I want, but seem to be missing something logic wise and I'm not quite sure what. Any help would be greatly appreciated.
Thankyou to anyone who helps!
The dataframe looks something like this :
myxdf = pd.DataFrame({
'pval1': [0.01,0.2,0.001,0.3,0.0003,0.05,1,0.002,0.2],
'pval2': [0.3,0.02,0.002,0.9,0.001,0.002,0.10,0.93,0.00001],
'lag': [1,2,3,1,2,3,1,2,3],
'var1': ['A','A','A','A','A','A','B','B','B'],
'var2': ['B','B','B','C','C','C','D','D','D']
})
myxdf
My desired output for the above example should ideally look like this (please note the new lag1 and lag2 columns):
myxdf2 = pd.DataFrame({
'pval1': [0.0010,0.0003,0.002],
'pval2' : [0.002,0.001,0.00001],
'lagp1': ['3','2','2'],
'lagp2': ['3','2','3'],
'var1': ['A','A','B'],
'var2': ['B','C','D']
})
myxdf2
Upvotes: 1
Views: 1152
Reputation: 862661
I believe you need DataFrameGroupBy.idxmin
for indices of minimal values, use it for select rows, rename columns and join by concat
:
df = myxdf.groupby(['var1','var2'])[['pval1', 'pval2']].idxmin()
df1 = myxdf.loc[df['pval1'], ['pval1','lag']].rename(columns={'lag':'lagp1'})
df2 = myxdf.loc[df['pval2'], ['pval2','lag','var1','var2']].rename(columns={'lag':'lagp2'})
df = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
cols = ['pval1', 'pval2', 'lagp1', 'lagp2', 'var1', 'var2']
df = df[cols]
print (df)
pval1 pval2 lagp1 lagp2 var1 var2
0 0.0010 0.00200 3 3 A B
1 0.0003 0.00100 2 2 A C
2 0.0020 0.00001 2 3 B D
Upvotes: 1