Reputation: 29
Ok im feeling like this one is very easy and there should also be the right answer in a previous thread, but apparently I couldnt manage to find the answer by myself or in a thread. Here is what I got: I've got a dataframe with different samples belonging to groups
pd.DataFrame({'sample1': [1,2,3], 'sample2':[2,4,6], 'sample3':[4,4,4], 'sample4':[6,6,6], 'divisor':[1,2,1]})
groups=[["sample1","sample2"],["sample3","sample4"]]
I want the code te create a new column for each sample dependent on the sum of the group where this sample is in. Result should be 0 if quotient is below 0 or else should be original value. This first part perfectly does the summing:
for i in range(len(groups)):
df["groupsum"+str(i)]=df[groups[i]].sum(axis=1)
for sample in groups[i]:
df[sample+"_corr"]=""
df[sample+"_corr"]= df[sample].apply(lambda x: 0 if (df["groupsum"+str(i)]/df["divisor"])<4 else df[sample])
I get the error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So whats the right way to handle this? Thanks a lot in advance
Upvotes: 0
Views: 33
Reputation: 5745
just use np.wehere
instead of looping the dataframe with apply:
df[sample+"_corr"]= np.where((df["groupsum"+str(i)]/df["divisor"])<4 , 0 , df[sample])
Output:
sample1 sample2 sample3 sample4 divisor groupsum0 sample1_corr sample2_corr groupsum1 sample3_corr sample4_corr
0 1 2 4 6 1 3 0 0 10 4 6
1 2 4 4 6 2 6 0 0 10 4 6
2 3 6 4 6 1 9 3 6 10 4 6
this is also better performance because apply is very slow solution and should be avoided when possible.
Upvotes: 1