Reputation: 480
I am new to Pandas and I am trying to do the following thing::
I need to create in arts a new column called articleScore. Each article must have the articleScore which is the sum of all commentScores related to that article (same articleID), divided by sqrt(n_comms + 1), where n_comms is the number of comments with that specific ID.
I already managed to do this but In a very inefficient way (pictured below)
for article in arts:
n, tempScore = 0
for i, value in comms.iterrows():
if value['articleID'] == article['articleID']:
tempScore + = value['commentScore']
n += 1
article['articleScore'] /= math.sqrt(n+1)
Edit: Here's an example of what I would like to happen:
comms:
__________________________
| # | artID | commScore |
| 0 | 1x5w | 2 |
| 1 | 77k3 | 1 |
| 2 | 77k3 | -1 |
| 3 | 3612 | 5 |
| 4 | 1x5w | 3 |
--------------------------
arts:
___________________________
| # | artID | artScore (?) |
| 0 | 1x5w | 2.89 |
| 1 | 77k3 | 0 |
| 2 | 3612 | 3.54 |
-------------------------
I need to (create and) fill the artScore column. Each artScore is the sum of the commentScores, but only of the comments with the same artID of the article, divided by sqrt(n+1).
Can anybody help me? Thanks a lot!
Andrea
Upvotes: 3
Views: 169
Reputation: 1070
#article count and sum
df = df.groupby('artID').agg(['sum', 'count'])
#create new column and utilize your formula
df['artScore'] = df['commScore']['sum'] / math.sqrt(df['commScore']['count']+1)
commScore artScore
sum count
artID
1x5w 5 2 5.0
3612 5 1 5.0
77k3 0 2 0.0
Upvotes: 1
Reputation: 3629
You can use groupby
with agg
and a custom lambda
function to apply to each group:
comms.groupby('artID').agg(
{'commScore': lambda x: x.sum() / np.sqrt(len(x) + 1)}
).reset_index().rename(columns={'commScore': 'artScore'})
Result:
artID artScore
0 1x5w 2.886751
1 3612 3.535534
2 77k3 0.000000
Upvotes: 1
Reputation: 1439
I think you can use groupby
followed by a merge on 'artID'
:
grpd = comms.groupby('artID')
to_merge = grpd.sum().divide(np.sqrt(grpd.count()+1)).reset_index().rename(columns={'commScore': 'artScore'})[['artID', 'artScore']]
arts.merge(to_merge, on='artID')
Upvotes: 1