Reputation: 181
ID onevalue twovalue
ABC 0.568 0.004
ABC 5.620 -2.514
ABC -1.551 -0.452
LMN 14.56 7.456
LMN 6.750 2.896
LMN 5.620 1.166
LMN 8.640 3.486
Based on the dataframe above, i have the following function:
cond = (df.onevalue > df.twovalue)
df['S'] = np.where(cond, df.onevalue,"")
This works fine and gives me the appropriate 'onevalue' on a new 'S' dataframe/series.
What I am now trying to do is I want it to create multiple 'S' columns and put just single 'onevalue' variable per column for the given "ID". The column generation should restart after a new ID appears and that the repeated values should not count. Here's an intended result for clarification's sake:
ID onevalue twovalue S S1 S2
ABC 0.568 0.004 0.568
ABC 5.620 -2.514 5.620
ABC -1.551 -0.452
LMN 14.56 7.456 14.56
LMN 6.750 2.896 6.750
LMN 5.620 1.166
LMN 8.640 3.486 8.640
Not sure if pandas allow undefined creation of columns, if that is the case we can assume a maximum of 7 columns per ID.
I have been trying to work my way around it but no luck. Little help will be appreciated.
THANKS!
Upvotes: 2
Views: 64
Reputation: 150745
Yes, you can enumerate the rows with relative order within the ID
and unstack:
df.join(pd.DataFrame({'S':np.maximum(df.onevalue, df.twovalue),
'idx': df.groupby('ID').cumcount()})
.set_index('idx',append=True)['S']
.unstack('idx').fillna('').add_prefix('S')
)
Output:
ID onevalue twovalue S0 S1 S2 S3
0 ABC 0.568 0.004 0.568
1 ABC 5.620 -2.514 5.62
2 ABC -1.551 -0.452 -0.452
3 LMN 14.560 7.456 14.56
4 LMN 6.750 2.896 6.75
5 LMN 5.620 1.166 5.62
6 LMN 8.640 3.486 8.64
Upvotes: 1