pandaseatpotatos
pandaseatpotatos

Reputation: 95

Get Sum of Every Time Two Values Match

My googleing has failed me, I think my main issue is im unsure how to phrase the question (sorry about the crappy title). I'm trying to find the total each time 2 people vote the same way. Below you will see an example of how the data looks and the output I was looking for. I have a working solution but its very slow (see bottom) and was wondering if theres a better way to go about this.

This is how the data is shaped

----------------------------------
event   person  vote
 1        a      y
 1        b      n
 1        c      nv
 1        d      nv
 1        e      y
 2        a      n
 2        b      nv
 2        c      y
 2        d      n
 2        e      n
----------------------------------

This is the output im looking for

----------------------------------
Person  a   b   c   d   e
   a    2   0   0   1   2
   b    0   2   0   0   0
   c    0   0   2   1   0
   d    1   0   1   2   1
   e    2   0   0   1   2
----------------------------------


Working Code

df = df.pivot(index='event', columns='person', values='vote')

frame = pd.DataFrame(columns=df.columns, index=df.columns)

for person1, value in frame.iterrows():

    for person2 in frame:

        count = 0 
        for i, row in df.iterrows():

            person1_votes = row[person1]
            person2_votes = row[person2]

            if person1_votes == person2_votes:
                count += 1

        frame.at[person1, person2] = count

Upvotes: 3

Views: 96

Answers (2)

Andy L.
Andy L.

Reputation: 25239

@Wen-Ben already answered your question. It bases on the concept of finding all possibilities of pair-wise person and count those having same vote. Finding all pair-wise is cartesian product (cross join). You may read great post from @cs95 on cartesian product (CROSS JOIN) with pandas

In your problem, you count same vote per event, so it is cross joint per event. Therefore, you don't need adding helper key column as in @cs95 post. You may cross join directly on column event. After cross join, filter out those pair-wise person<->person having same vote using query. Finally, using crosstab to count those pair-wise.

Below is my solution:

df_match = df.merge(df, on='event').query('vote_x == vote_y')    
pd.crosstab(index=df_match.person_x, columns=df_match.person_y)

Out[1463]:
person_y  a  b  c  d  e
person_x
a         2  0  0  1  2
b         0  2  0  0  0
c         0  0  2  1  0
d         1  0  1  2  1
e         2  0  0  1  2

Upvotes: 1

BENY
BENY

Reputation: 323276

Try look at your problem in different way

df=df.assign(key=1)
mergedf=df.merge(df,on=['event','key'])
mergedf['equal']=mergedf['vote_x'].eq(mergedf['vote_y'])
output=mergedf.groupby(['person_x','person_y'])['equal'].sum().unstack()
output
Out[1241]: 
person_y    a    b    c    d    e
person_x                         
a         2.0  0.0  0.0  1.0  2.0
b         0.0  2.0  0.0  0.0  0.0
c         0.0  0.0  2.0  1.0  0.0
d         1.0  0.0  1.0  2.0  1.0
e         2.0  0.0  0.0  1.0  2.0

Upvotes: 3

Related Questions