Michael Schweitzer
Michael Schweitzer

Reputation: 105

Python PANDAS: Applying a function to a dataframe, with arguments defined within dataframe

I have a dataframe with headers 'Category', 'Factor1', 'Factor2', 'Factor3', 'Factor4', 'UseFactorA', 'UseFactorB'.

The value of 'UseFactorA' and 'UseFactorB' are one of the strings ['Factor1', 'Factor2', 'Factor3', 'Factor4'], keyed based on the value in 'Category'.

I want to generate a column, 'Result', which equals dataframe[UseFactorA]/dataframe[UseFactorB]

Take the below dataframe as an example:

[Category] [Factor1] [Factor2] [Factor3] [Factor4] [useFactor1] [useFactor2]
     A         1        2         5           8     'Factor1'    'Factor3'
     B         2        7         4           2     'Factor3'    'Factor1'

The 'Result' series should be [2, .2]

However, I cannot figure out how to feed the value of useFactor1 and useFactor2 into an index to make this happen--if the columns to use were fixed, I would just give

df['Result'] = df['Factor1']/df['Factor2']

However, when I try to give

df['Results'] = df[df['useFactorA']]/df[df['useFactorB']]

I get the error

ValueError: Wrong number of items passed 3842, placement implies 1

Is there a method for doing what I am trying here?

Upvotes: 0

Views: 55

Answers (2)

Ben Pap
Ben Pap

Reputation: 2579

Here's the one liner:

df['Results'] = [df[df['UseFactorA'][x]][x]/df[df['UseFactorB'][x]][x] for x in range(len(df))]

How it works is:

df[df['UseFactorA']]

Returns a data frame,

df[df['UseFactorA'][x]]

Returns a Series

df[df['UseFactorA'][x]][x]

Pulls a single value from the series.

Upvotes: 1

it's-yer-boy-chet
it's-yer-boy-chet

Reputation: 2007

Probably not the prettiest solution (because of the iterrows), but what comes to mind is to iterate through the sets of factors and set the 'Result' value at each index:

for i, factors in df[['UseFactorA', 'UseFactorB']].iterrows():
    df.loc[i, 'Result'] = df[factors['UseFactorA']] / df[factors['UseFactorB']]

Edit:

Another option:

def factor_calc_for_row(row):
    factorA = row['UseFactorA']
    factorB = row['UseFactorB']
    return row[factorA] / row[factorB]

df['Result'] = df.apply(factor_calc_for_row, axis=1)

Upvotes: 1

Related Questions