Reputation: 307
I have this dataframe.
df = pd.DataFrame({'userId': [10,20,10,20,10,20,60,90,60,90,60,90,30,40,30,40,30,40,50,60,50,60,50,60],
'movieId': [500,500,800,800,700,700,1100,1100,1900,1900,2000,2000,1600,1600,1901,1901,3000,3000,3025,3025,4000,4000,500,500],
'ratings': [3.5,4.5,2.0,5.0,3.0,1.5,1.5,4.5,2.5,4.5,4.0,5.0,4.0,1.5,4.5,4.5,3.5,4.5,3.0,5.0,4.0,1.5,3.5,5]})
I have converted this dataframe to different chunks. After every 6 rows new chunk is created.
After creating chunks it looks like this
df
userId movieId ratings
0 10 500 3.5
1 20 500 4.5
2 10 800 2.0
3 20 800 5.0
4 10 700 4.0
5 20 700 1.5
userId movieId ratings
6 60 1100 3.5
7 90 1100 4.5
8 60 1900 3.5
9 90 1900 4.5
10 60 2000 2.0
11 90 2000 5.0
userId movieId ratings
12 30 1600 4.0
13 40 1600 1.5
14 30 1901 3.5
15 40 1901 4.5
16 30 3000 3.5
17 40 3000 4.5
userId movieId ratings
18 50 3025 2.0
19 60 3025 5.0
20 50 4000 4.0
21 60 4000 1.5
22 50 500 3.5
23 60 500 4.5
What i want to do is this: 1. Take absolute difference between two users for same movie (each pair of user have watched three movies). 2. Take the average of difference for first two movies then compare it with 3rd movie difference. if the third movie difference is less than avg of first two movies take it positive otherwise negative. For example, in first chunk diff b/w first movie is 1 and for next is 3 (avg diff is 2). For the 3rd pair diff is 2.5 (diff > avg) it should give negative result. 3. Do it for every chunk in dataframe.
positive_counter = []
negative_counter = []
numberOfUsers = 2
numberOfMovies = 3
usersLength = numberOfUsers*numberOfMovies
def chunker(seq, size):
return (seq[pos:pos + size] for pos in range(0, len(seq), size))
def finding_rating(df):
for i in chunker(data,usersLength):
df['diff'] = df.groupby('movieId')['ratings'].diff().abs()
#code
Upvotes: 1
Views: 1039
Reputation: 2407
For a chunk, you can apply this:
dfchunk
userId movieId ratings
0 10 500 3.5
1 20 500 4.5
2 10 800 2.0
3 20 800 5.0
4 10 700 4.0
5 20 700 1.5
r=dfchunk.pivot(index="movieId",columns="userId")
ratings
userId 10 20
movieId
500 3.5 4.5
700 4.0 1.5
800 2.0 5.0
r.columns=["u1","u2"]
r["drate"]=r.u1.sub(r.u2).abs()
u1 u2 drate
movieId
500 3.5 4.5 1.0
700 4.0 1.5 2.5
800 2.0 5.0 3.0
v= r.drate.iloc[:-1].mean()-r.drate.iloc[-1]
-1.25
Edit:
if v<0:
negative_counter.append(v)
else:
positive_counter.append(v)
Upvotes: 1