kodachrome
kodachrome

Reputation: 350

Multiplying columns by another column in a dataframe

(Full disclosure that this is related to another question I asked, so bear with me if I should have appended it to what I wrote previously, even though the problem is different.)

I have a dataframe consisting of a column of weights and columns containing binary values of 0 and 1. I'd like to multiply every column within the dataframe by the weights column. However, I seem to be replacing every column within the dataframe with the weight column. I'm sure I'm missing something incredibly stupid/basic here--I'm rather new to pandas and python as a whole. What am I doing wrong?

celebfile = pd.read_csv(celebcsv)
celebframe = pd.DataFrame(celebfile) 
behaviorfile = pd.read_csv(behaviorcsv)
behaviorframe = pd.DataFrame(behaviorfile)
celebbehavior = pd.merge(celebframe, behaviorframe, how ='inner', on = 'RespID')
celebbehavior2 = celebbehavior.copy()
def multiplycolumns(column):
    for column in celebbehavior:
        return celebbehavior[column]*celebbehavior['WEIGHT']
celebbehavior2 = celebbehavior2.apply(lambda column: multiplycolumns(column), axis=0)
print(celebbehavior2.head())

Upvotes: 2

Views: 4478

Answers (3)

Woody Pride
Woody Pride

Reputation: 13955

you can use the `mul' method to multiply the columns. However, just fyi if you do want to use apply you can bear in mind the following:

The apply function passes each series in the dataframe to the function. This looping is inherent to the apply function. Therefore first thing to say is that your loop within the function is redundant. Also you have a return statement inside it which is causing the behavior you do not want.

If each column is passed as the argument automatically all you need to do is tell the function what to multiply it by. In this case your weights series.

Here is an implementation using apply. Of course the undesirable here is that the weights are also multiplpied by themselves:

df = pd.DataFrame({'1' : [1, 1, 0, 1], 
                   '2' : [0, 0, 1, 0], 
                   'weights' : [0.5, 0.25, 0.1, 0.05]})

def multiply_columns(column, weights):
    return column * weights

df.apply(lambda x: multiply_columns(x, df['weights']))

Upvotes: 1

piRSquared
piRSquared

Reputation: 294238

read_csv
returns a pd.DataFrame... Not necessary to use pd.DataFrame on top of it.

mul with axis=0
You can use apply but that is awkward. Use mul(axis=0)... This should be all you need.

df = pd.read_csv(celebcsv).merge(pd.read_csv(behaviorcsv), on='RespID')
df = df.mul(df.WEIGHT, 0)

?
You said that it looks like you are just replacing with the weights column? Are you other columns all ones?

Upvotes: 1

akuiper
akuiper

Reputation: 214937

You have return statement in a for loop, which means the for loop is executed only once, to multiply a data frame with a column, you can use mul method with the correct axis parameter:

celebbehavior.mul(celebbehavior['WEIGHT'], axis=0)

Upvotes: 2

Related Questions