Reputation: 875
I am running Windows 10, Python 2.7 through the Spyder IDE.
I have a pandas DataFrame
called df
:
df = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
, 'fld2': ['x', 'y', 'z','x','y','z','x','y']
, 'relationship': [.25,.25,.50,.33,.33,.33,.5,.5]})
df
Out[172]:
fld1 fld2 relationship
0 x x 0.25
1 x y 0.25
2 x z 0.50
3 y x 0.33
4 y y 0.33
5 y z 0.33
6 z x 0.50
7 z y 0.50
I would like build a function
that iterates over the rows of a Dataframe
df
to produce a new column in df
.
This function would start by:
Step 1: take the relationship
column where fld1
= x
and fld2
= x
and then
Step 2: check to see if cases where fld1
= x
has any more unique values of fld2
.
Step 3: If there is another unique value of fld2
associated with fld1
= x
(in this two more unique values exist, x
and y
), add the relationship
value from Step 1 to the relationship
column of fld1
= x
and the next unique value of fld2
(in this example fld2
= y
is the next unique value) multiplied by the inverse of the relationship (in this case fld1
= y
and fld2
= x
)
Step 4: repeat Step 2 until all unique values of fld2
with fld1
= x
have been calculated in this way
Step 4: repeat Step 1 for the next unique value of fld1
. In this case it would be fld1
= y
To explain this function logic another way, below is an example of how this would be done in excel
:
A B C D
1 fld1 fld2 relationship Connection
2 x x 0.25 =C2+(C3*C5)+(C4*C8)
3 x y 0.25 =C3+(C4*C9)
4 x z 0.5 =C4+(C3*C7)
5 y x 0.33 =C5+(C7*C8)
6 y y 0.33 =C6+(C5*C3)+(C7*C9)
7 y z 0.33 =C7+(C5*C4)
8 z x 0.5 =C8+(C9*C5)
9 z y 0.5 =C9+(C8*C4)
The output of the function should product a Dataframe
identical to df2
below:
df2 = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
, 'fld2': ['x', 'y', 'z','x','y','z','x','y']
, 'relationship': [.25,.25,.50,.33,.33,.33,.5,.5]
, 'connection': [.5825,0.5,0.5825,0.495,0.5775,0.495,0.665,0.75]})
df2
Out[174]:
connection fld1 fld2 relationship
0 0.5825 x x 0.25
1 0.5000 x y 0.25
2 0.5825 x z 0.50
3 0.4950 y x 0.33
4 0.5775 y y 0.33
5 0.4950 y z 0.33
6 0.6650 z x 0.50
7 0.7500 z y 0.50
Upvotes: 1
Views: 102
Reputation: 8207
OK here it goes, this is one way of solving your problem. I used a dictionary to hold the values for each combination.
xyzdict = {"xx":0.25,
"xy":0.25,
"xz":0.5,
"yx":0.33,
"yy":0.33,
"yz":0.33,
"zx":0.5,
"zy":0.5}
Then, for each 'connection' combination, the first letter was always that same as the first letter for fld1
. the second letter was always not
fld1
. So here is an exhaustive and maybe not pythonic way of calculating your values and storing the combinations' connections values in a dictionary for later use.
cnxn = {}
xyz = ["x","y","z"]
for combo in xyzdict.keys():
#print "the combo is %s" % (combo) #xyzdict[two] #actual value
first_letter = combo[0]
not_second = [combo[0],combo[1]]
not_second_letter = list(set(xyz) - set(not_second))
if len(not_second_letter) > 1:
multi_cnxn = []
for each_not_second_letter in not_second_letter:
fwd = ''.join((first_letter,each_not_second_letter))
rev = ''.join((each_not_second_letter,first_letter))
cnxnval = xyzdict[fwd] * xyzdict[rev]
multi_cnxn.append(cnxnval)
rowvalue = xyzdict[combo] + sum(multi_cnxn)
cnxn[combo] =rowvalue
else:
fwd = ''.join((first_letter,not_second_letter[0]))
rev = ''.join((not_second_letter[0],first_letter))
cnxnval = xyzdict[fwd] * xyzdict[rev]
rowvalue = xyzdict[combo] + cnxnval
cnxn[combo] = rowvalue
Almost there, define a function check
that will pull out your fld1
and fld2
and return the calculated values from cnxn
above.
def check(fld1,fld2,cnxn_sub):
rowpair = ''.join((fld1,fld2))
return cnxn_sub[rowpair]
Finally, a little pandas apply
to bring it all home.
df['connection'] = df.apply(lambda row: check(row['fld1'], row['fld2'],cnxn), axis=1)
Here are my results, our "yz" connection is a little off, idk if that is on your end or mine...
fld1 fld2 relationship connection
0 x x 0.25 0.5825
1 x y 0.25 0.5000
2 x z 0.50 0.5825
3 y x 0.33 0.4950
4 y y 0.33 0.5775
5 y z 0.33 0.4125
6 z x 0.50 0.6650
7 z y 0.50 0.7500
Good Luck!
Upvotes: 1