rum_ham52
rum_ham52

Reputation: 11

Problems with Python & Pandas: Adding calculated column to dataframe that includes data from a function provides error

I am trying to add a calculated column: "NetEarnings" to my DataFram "Wages". The "NetEarnings" column subtracting Tax from AnnualIncome; tax is drawing from a function I created to calculate taxes. It will not let me add this new column due to an error:

"TypeError: unsupported operand type(s) for -: 'float' and 'tuple'"

I have tried mostly everything, unsure if I made a dumb mistake somewhere. Thanks for your help!

Code:

def Tax(AnnualIncome):
    if (0 < AnnualIncome) & (AnnualIncome <= 21450):
        return (.15 * AnnualIncome)
    elif (21450 < AnnualIncome) & (AnnualIncome <= 51900):
        return (3215.5 + ((AnnualIncome - 21450) * .28))
    else: 
        return (11,743.5 + ((AnnualIncome - 51900) * .31))

wages['Tax'] = wages['AnnualIncome'].apply(Tax)

# Problem line
wages['NetEarning'] = wages['AnnualIncome'] - wages['Tax']

enter image description here

Error:

TypeError: unsupported operand type(s) for -: 'float' and 'tuple'

Upvotes: 1

Views: 126

Answers (3)

Parfait
Parfait

Reputation: 107587

As much as you can, avoid DataFrame.apply which is a hidden loop. Since your function uses logical, mutually exclusive conditions, consider numpy.select to process assignment as whole arrays and not row wise scalar values:

condlist = [(0 < wages['AnnualIncome']) & (wages['AnnualIncome'] <= 21450),
            (21450 < wages['AnnualIncome']) & (wages['AnnualIncome'] <= 51900),
            (0 >= wages['AnnualIncome']) | (wages['AnnualIncome'] > 51900)]

choicelist = [(.15 * wages['AnnualIncome']),
              (3215.5 + ((wages['AnnualIncome'] - 21450) * .28)),
              (11,743.5 + ((wages['AnnualIncome'] - 51900) * .31))]

wages['Tax'] = np.select(condlist, choicelist)
wages['NetEarning'] = wages['AnnualIncome'] - wages['Tax']

Upvotes: 0

kaihami
kaihami

Reputation: 815

Additionally to Ente answer,

I suggest to use np.where instead of apply. Apply is faster than a for loop, but much slower than apply.

A possible solution would be:

np.where(df['AnnualIncome'] <= 21450, (.15 *df['AnnualIncome']), 
               np.where(df['AnnualIncome'] <= 51900,
                        (3215.5 + ((df['income']- 21450) * .28)),
                              (11743.5 + (df['AnnualIncome'] - 51900) * .31)
                ) 
        )

Upvotes: 1

Ente
Ente

Reputation: 2462

The following line returns a tuple because of the ,:

        return (11,743.5 + ((AnnualIncome - 51900) * .31))

Try:

        return (11743.5 + ((AnnualIncome - 51900) * .31))

Upvotes: 0

Related Questions