Reputation: 99
If a pandas dataframe df contains:
A B C D
a 1 2 3 4
b 2 NaN NaN 5
c NaN 7 NaN 2
d NaN 2 4 3
How do I add the first row to all the rest of the rows, only where they contain a number, to get the resulting dataframe:
A B C D
b 3 NaN NaN 9
c NaN 9 NaN 6
d NaN 4 7 7
I plan to do this, then make a dictionary of the row names and divide the product of the columns each row of the first table by the same row in the second table, saving the values in the dictionary. I have working code that does this (below), but I worry that it's not "PANDAS" enough and that it's overly complicated for the simple task I want to perform. Do I have the optimal solution, or am I missing something obvious?
If the Pandas code still has to iterate over the rows, then it's not worth it, but I feel like there should be a way to do this in-place.
Code:
import numpy as np
import pandas as pd
dindex = [1,2,3] #indices of drugs to select (set this)
def get_drugs(): #generates random "drug characteristics" as pandas df
cduct = ['dose','g1','g2','g3','g4','g5']
drg = ['d1','d2','d3','d4']
return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)
def sel_drugs(dframe, selct): #removes unwanted drugs from df.
#Pass df and dindex to this function
return dframe.iloc[:,selct].values, dframe[1:].index.tolist()
#returns a tuple of [values, names]
def cal_conduct(val, cnames): #calculates conductance scaling.
#Pass values and names to this function
cduct = {} #initialize dict
for ix, gname in enumerate(cnames):
_top = val[ix+1]; _bot = val[0]+val[ix+1]
cduct[gname] = (np.product(_top[np.isfinite(_top)])/
np.product(_bot[np.isfinite(_bot)]))
return cduct #return a dictionary of scaling factors
def main():
selection = sel_drugs(get_drugs(),dindex)
print cal_conduct(selection[0], selection[1])
main()
Upvotes: 0
Views: 703
Reputation: 99
Here is some code based on what @Jeff answered. It is about 40% slower, at least with the small test data, but it is simpler.
import numpy as np
import pandas as pd
dindex = [1,2,3] #indices of drugs to select (set this)
def get_drugs(): #generates random "drug characteristics" as pandas df
cduct = ['dose','g1','g2','g3','g4','g5']
drg = ['d1','d2','d3','d4']
return pd.DataFrame(abs(np.random.randn(6,4)),index=cduct,columns=drg)
def cal_conduct(frame,selct): #calculates conductance scaling.
#Pass df with selections made
s = frame.iloc[:,selct]
cduct = s.iloc[1:].prod(1)/(s.iloc[0]+s.iloc[1:]).prod(1)
return cduct.to_dict() #return a dictionary of scaling factors
def main():
scaling = cal_conduct(get_drugs(), dindex)
print scaling
main()
Upvotes: 0
Reputation: 129008
Pandas automatically aligns/broadcasts so this is straightforward
In [8]: df
Out[8]:
A B C D
a 1 2 3 4
b 2 NaN NaN 5
c NaN 7 NaN 2
d NaN 2 4 3
In [11]: df.iloc[1:] + df.iloc[0]
Out[11]:
A B C D
b 3 NaN NaN 9
c NaN 9 NaN 6
d NaN 4 7 7
The second part is this if I am reading it correctly
In [12]: df2 = df.iloc[1:] + df.iloc[0]
In [13]: df.prod()
Out[13]:
A 2
B 28
C 12
D 120
dtype: float64
In [14]: df2/df.prod()
Out[14]:
A B C D
b 1.5 NaN NaN 0.075000
c NaN 0.321429 NaN 0.050000
d NaN 0.142857 0.583333 0.058333
Upvotes: 3