Reputation: 1138
I have this dataset:
df = d = pd.DataFrame({
'duplicates': [
[('007', "us1", "us2", "time1", 'time2', 4)],
[('008', "us1", "us2", "time1", 'time2', 5)],
[('009', "us1", "us2", "time1", 'time2', 6)],
[('007', 'us2', "us3", "time1", 'time2', 4)],
[('008', 'us2', "us3", "time1", 'time2', 7)],
[('009', 'us2', "us3", "time1", 'time2', 11)],
[('001', 'us5', 'us1', "time1", 'time2', 0)],
[('008', 'us5', 'us1', "time1", 'time2', 19)],
[('007',"us3", "us2", "time1", 'time2', 2)],
[('007',"us3", "us2", "time1", 'time2', 34)],
[('009',"us3", "us2", "time1", 'time2', 67)]],
'numberOfInteractions': [1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 11]
})
That I manipule like this:
df['duplicates'] = df.apply(
lambda x: [(x['numberOfInteractions'],a, b, c, d, e, f) for a, b, c, d, e, f in x.duplicates], 1)
user_record_access =(pd.DataFrame(df["duplicates"].explode().tolist(),
columns=["numberOfInteractions", "ID","USER1","USER2","TAU1","TAU2","DELAY"])
.groupby(["USER1","USER2"])["numberOfInteractions"]
.agg(sum).to_frame().reset_index())
Then I want to arrive at a dataset like this:
USER1 USER2 numberOfInteractions proportion
us1 us2 6 0.315789
us2 us3 15 1
us3 us2 13 0.684211
us5 us1 15 1.000000
The 'proportion' in this case I want:
- numberOfInteractions of the line/sum(numberOfInteractions of USER2)
For example:
But I cannot find out how, I tried some things (like this one below) but without success:
user_record_access['proportion'] = user_record_access['numberOfInteractions']/user_record_access['sumUSER2'] #.apply(lambda x: x['numberOfInteractions']/x['numberOfInteractions'].sum(skipna=True)).reset_index()
Upvotes: 1
Views: 1181
Reputation: 71687
Use:
# Note: Here 'df1' refers to the dataframe 'user_record_access'
s = df1.groupby('USER2')['numberOfInteractions'].transform('sum')
df1['proportion'] = df1['numberOfInteractions'].div(s)
Details:
Use DataFrame.groupby
on column USER2
and use groupby.transform
to transfrom the column numberOfInteractions
using sum
:
print(s)
0 19
1 15
2 19
3 15
Name: USER2, dtype: int64
Use Series.div
and divide the column numberOfInteractions
by the transformed series s
and assign this result back to new column proportions
.
print(df1)
USER1 USER2 numberOfInteractions proportion
0 us1 us2 6 0.315789
1 us2 us3 15 1.000000
2 us3 us2 13 0.684211
3 us5 us1 15 1.000000
Upvotes: 2