Reputation: 350
(Full disclosure that this is related to another question I asked, so bear with me if I should have appended it to what I wrote previously, even though the problem is different.)
I have a dataframe consisting of a column of weights and columns containing binary values of 0 and 1. I'd like to multiply every column within the dataframe by the weights column. However, I seem to be replacing every column within the dataframe with the weight column. I'm sure I'm missing something incredibly stupid/basic here--I'm rather new to pandas and python as a whole. What am I doing wrong?
celebfile = pd.read_csv(celebcsv)
celebframe = pd.DataFrame(celebfile)
behaviorfile = pd.read_csv(behaviorcsv)
behaviorframe = pd.DataFrame(behaviorfile)
celebbehavior = pd.merge(celebframe, behaviorframe, how ='inner', on = 'RespID')
celebbehavior2 = celebbehavior.copy()
def multiplycolumns(column):
for column in celebbehavior:
return celebbehavior[column]*celebbehavior['WEIGHT']
celebbehavior2 = celebbehavior2.apply(lambda column: multiplycolumns(column), axis=0)
print(celebbehavior2.head())
Upvotes: 2
Views: 4478
Reputation: 13955
you can use the `mul' method to multiply the columns. However, just fyi if you do want to use apply you can bear in mind the following:
The apply function passes each series in the dataframe to the function. This looping is inherent to the apply function. Therefore first thing to say is that your loop within the function is redundant. Also you have a return statement inside it which is causing the behavior you do not want.
If each column is passed as the argument automatically all you need to do is tell the function what to multiply it by. In this case your weights series.
Here is an implementation using apply. Of course the undesirable here is that the weights are also multiplpied by themselves:
df = pd.DataFrame({'1' : [1, 1, 0, 1],
'2' : [0, 0, 1, 0],
'weights' : [0.5, 0.25, 0.1, 0.05]})
def multiply_columns(column, weights):
return column * weights
df.apply(lambda x: multiply_columns(x, df['weights']))
Upvotes: 1
Reputation: 294238
read_csv
returns a pd.DataFrame
... Not necessary to use pd.DataFrame
on top of it.
mul
with axis=0
You can use apply
but that is awkward. Use mul(axis=0)
... This should be all you need.
df = pd.read_csv(celebcsv).merge(pd.read_csv(behaviorcsv), on='RespID')
df = df.mul(df.WEIGHT, 0)
?
You said that it looks like you are just replacing with the weights column? Are you other columns all ones?
Upvotes: 1