Match values in dataframe rows

Question

I have a dataframe (df) that looks like:

name  type    cost
a     apples  1
b     apples  2
c     oranges 1 
d     banana  4
e     orange  6

Apart from using 2 for loops is there a way to loop through and compare each name and type in the list against each other and where the name is not itself (A vs A), the type is the same (apples vs apples) and its not a repeat of the same pair but the other way around e.g. if we have A vs B, I would not want to see B vs A, produce an output list of that looks:

name1, name2, status
    a      b       0
    c      e       0

Where the first 2 elements are the names where the criteria match and the third element is always a 0.

I have tried to do this with 2 for loops (see below) but can't get it to reject say b vs a if we already have a vs b.

def pairListCreator(staticData):
    for x, row1 in df.iterrows():

        name1 = row1['name']
        type1= row1['type']

        for y, row2 in df.iterrows():
            name2 = row['name']
            type2 = row['type']

            if name1<> name2 and  type1 = type2:
                pairList = name1,name2,0

Elmex80s · Accepted Answer

Something like this

import pandas as pd

# Data
data = [['a', 'apples', 1],
        ['b', 'apples', 2],
        ['c', 'orange', 1],
        ['d', 'banana', 4],
        ['e', 'orange', 6]]

# Create Dataframe
df = pd.DataFrame(data, columns=['name', 'type', 'cost'])
df.set_index('name', inplace=True)

# Print DataFrame
print df

# Count number of rows
nr_of_rows = df.shape[0]

# Create result and compare
res_col_nam = ['name1', 'name2', 'status']
result = pd.DataFrame(columns=res_col_nam)

for i in range(nr_of_rows):
    x = df.iloc[i]

    for j in range(i + 1, nr_of_rows):
        y = df.iloc[j]

        if x['type'] == y['type']:
            temp = pd.DataFrame([[x.name, y.name, 0]], columns=res_col_nam)
            result = result.append(temp)

# Reset the index
result.reset_index(inplace=True)
result.drop('index', axis=1, inplace=True)

# Print result
print 'result:'
print result

Output:

        type  cost
name              
a     apples     1
b     apples     2
c     orange     1
d     banana     4
e     orange     6
result:
  name1 name2  status
0     a     b     0.0
1     c     e     0.0

Match values in dataframe rows

Answers (2)

Related Questions