Sanwal
Sanwal

Reputation: 307

Finding absolute difference between rows and comparing the difference with other rows

I have this dataframe.

df = pd.DataFrame({'userId': [10,20,10,20,10,20,60,90,60,90,60,90,30,40,30,40,30,40,50,60,50,60,50,60],
                   'movieId': [500,500,800,800,700,700,1100,1100,1900,1900,2000,2000,1600,1600,1901,1901,3000,3000,3025,3025,4000,4000,500,500],  
                   'ratings': [3.5,4.5,2.0,5.0,3.0,1.5,1.5,4.5,2.5,4.5,4.0,5.0,4.0,1.5,4.5,4.5,3.5,4.5,3.0,5.0,4.0,1.5,3.5,5]})

I have converted this dataframe to different chunks. After every 6 rows new chunk is created.

After creating chunks it looks like this

df
   userId  movieId  ratings
0      10      500      3.5
1      20      500      4.5
2      10      800      2.0
3      20      800      5.0
4      10      700      4.0
5      20      700      1.5
    userId  movieId  ratings
6       60     1100      3.5
7       90     1100      4.5
8       60     1900      3.5
9       90     1900      4.5
10      60     2000      2.0
11      90     2000      5.0
    userId  movieId  ratings
12      30     1600      4.0
13      40     1600      1.5
14      30     1901      3.5
15      40     1901      4.5
16      30     3000      3.5
17      40     3000      4.5
    userId  movieId  ratings
18      50     3025      2.0
19      60     3025      5.0
20      50     4000      4.0
21      60     4000      1.5
22      50      500      3.5
23      60      500      4.5

What i want to do is this: 1. Take absolute difference between two users for same movie (each pair of user have watched three movies). 2. Take the average of difference for first two movies then compare it with 3rd movie difference. if the third movie difference is less than avg of first two movies take it positive otherwise negative. For example, in first chunk diff b/w first movie is 1 and for next is 3 (avg diff is 2). For the 3rd pair diff is 2.5 (diff > avg) it should give negative result. 3. Do it for every chunk in dataframe.

positive_counter = [] 
negative_counter = []

numberOfUsers = 2
numberOfMovies = 3
usersLength = numberOfUsers*numberOfMovies

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

def finding_rating(df):
     for i in chunker(data,usersLength):

        df['diff'] = df.groupby('movieId')['ratings'].diff().abs()
        #code




Upvotes: 1

Views: 1039

Answers (1)

kantal
kantal

Reputation: 2407

For a chunk, you can apply this:

dfchunk                                                                                                              
   userId  movieId  ratings
0      10      500      3.5
1      20      500      4.5
2      10      800      2.0
3      20      800      5.0
4      10      700      4.0
5      20      700      1.5

r=dfchunk.pivot(index="movieId",columns="userId")                                                                    

        ratings     
userId       10   20
movieId             
500         3.5  4.5
700         4.0  1.5
800         2.0  5.0

r.columns=["u1","u2"]                                                                                                

r["drate"]=r.u1.sub(r.u2).abs()                                                                                      

          u1   u2  drate
movieId                 
500      3.5  4.5    1.0
700      4.0  1.5    2.5
800      2.0  5.0    3.0

v= r.drate.iloc[:-1].mean()-r.drate.iloc[-1]                                                                            
-1.25

Edit:

if v<0:
   negative_counter.append(v)
else:
   positive_counter.append(v)

Upvotes: 1

Related Questions