BeeGee
BeeGee

Reputation: 875

Conditionally and interatively calculate column based on value of three columns

I am running Windows 10, Python 2.7 through the Spyder IDE.

I have a pandas DataFrame called df:

df = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
                , 'fld2': ['x', 'y', 'z','x','y','z','x','y']
                , 'relationship': [.25,.25,.50,.33,.33,.33,.5,.5]})

df
 Out[172]: 
  fld1 fld2  relationship
 0    x    x          0.25
 1    x    y          0.25
 2    x    z          0.50
 3    y    x          0.33
 4    y    y          0.33
 5    y    z          0.33
 6    z    x          0.50
 7    z    y          0.50

I would like build a function that iterates over the rows of a Dataframe df to produce a new column in df.

This function would start by:

Step 1: take the relationship column where fld1 = x and fld2 = x and then

Step 2: check to see if cases where fld1 = x has any more unique values of fld2.

Step 3: If there is another unique value of fld2 associated with fld1 = x (in this two more unique values exist, x and y), add the relationship value from Step 1 to the relationship column of fld1 = x and the next unique value of fld2 (in this example fld2 = y is the next unique value) multiplied by the inverse of the relationship (in this case fld1 = y and fld2 = x)

Step 4: repeat Step 2 until all unique values of fld2 with fld1 = x have been calculated in this way

Step 4: repeat Step 1 for the next unique value of fld1. In this case it would be fld1 = y

To explain this function logic another way, below is an example of how this would be done in excel:

        A       B         C             D
   1    fld1    fld2      relationship  Connection
   2    x        x         0.25         =C2+(C3*C5)+(C4*C8)
   3    x        y         0.25         =C3+(C4*C9)
   4    x        z         0.5          =C4+(C3*C7)
   5    y        x         0.33         =C5+(C7*C8)
   6    y        y         0.33         =C6+(C5*C3)+(C7*C9)
   7    y        z         0.33         =C7+(C5*C4)
   8    z        x         0.5          =C8+(C9*C5)
   9    z        y         0.5          =C9+(C8*C4)

The output of the function should product a Dataframe identical to df2 below:

df2 = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
            , 'fld2': ['x', 'y', 'z','x','y','z','x','y']
            , 'relationship': [.25,.25,.50,.33,.33,.33,.5,.5]
            , 'connection':     [.5825,0.5,0.5825,0.495,0.5775,0.495,0.665,0.75]})

df2
Out[174]: 
    connection fld1 fld2  relationship
0      0.5825    x    x          0.25
1      0.5000    x    y          0.25
2      0.5825    x    z          0.50
3      0.4950    y    x          0.33
4      0.5775    y    y          0.33
5      0.4950    y    z          0.33
6      0.6650    z    x          0.50
7      0.7500    z    y          0.50

Upvotes: 1

Views: 102

Answers (1)

Kevin
Kevin

Reputation: 8207

OK here it goes, this is one way of solving your problem. I used a dictionary to hold the values for each combination.

xyzdict = {"xx":0.25,
          "xy":0.25,
          "xz":0.5,
          "yx":0.33,
          "yy":0.33,
          "yz":0.33,
          "zx":0.5,
          "zy":0.5}

Then, for each 'connection' combination, the first letter was always that same as the first letter for fld1. the second letter was always not fld1. So here is an exhaustive and maybe not pythonic way of calculating your values and storing the combinations' connections values in a dictionary for later use.

cnxn = {}
xyz = ["x","y","z"]

for combo in xyzdict.keys():
    #print "the combo is %s" % (combo) #xyzdict[two] #actual value
    first_letter = combo[0]

    not_second = [combo[0],combo[1]]
    not_second_letter = list(set(xyz) - set(not_second))

    if len(not_second_letter) > 1:
        multi_cnxn = []
        for each_not_second_letter in not_second_letter:

            fwd = ''.join((first_letter,each_not_second_letter))
            rev = ''.join((each_not_second_letter,first_letter))
            cnxnval = xyzdict[fwd] * xyzdict[rev]

            multi_cnxn.append(cnxnval)

        rowvalue = xyzdict[combo] + sum(multi_cnxn)
        cnxn[combo] =rowvalue
    else:
        fwd = ''.join((first_letter,not_second_letter[0]))
        rev = ''.join((not_second_letter[0],first_letter))
        cnxnval = xyzdict[fwd] * xyzdict[rev]

        rowvalue = xyzdict[combo] + cnxnval
        cnxn[combo] = rowvalue

Almost there, define a function check that will pull out your fld1 and fld2 and return the calculated values from cnxn above.

def check(fld1,fld2,cnxn_sub):
    rowpair = ''.join((fld1,fld2))
    return cnxn_sub[rowpair]

Finally, a little pandas apply to bring it all home.

df['connection'] = df.apply(lambda row: check(row['fld1'], row['fld2'],cnxn), axis=1)

Here are my results, our "yz" connection is a little off, idk if that is on your end or mine...

fld1    fld2    relationship    connection
0   x   x   0.25    0.5825
1   x   y   0.25    0.5000
2   x   z   0.50    0.5825
3   y   x   0.33    0.4950
4   y   y   0.33    0.5775
5   y   z   0.33    0.4125
6   z   x   0.50    0.6650
7   z   y   0.50    0.7500

Good Luck!

Upvotes: 1

Related Questions