jayko03
jayko03

Reputation: 2481

python dataframe creating new column

I am using plotly and in their document, I saw this statement.

df['text'] = df['state'] + '<br>' +\
    'Beef '+df['beef']+' Dairy '+df['dairy']+'<br>'+\
    'Fruits '+df['total fruits']+' Veggies ' + df['total veggies']+'<br>'+\
    'Wheat '+df['wheat']+' Corn '+df['corn']

Plotly creating map
Without any doubt, I tried to implement my dataset into this code,

df_region["text"] = df_region["addr_state"] + '<br>' + 
                    "Total loan amount ($ USD): " + df_region["loan_amnt"] + "<br>" + 
                    "Avg loan amount ($ USD): " + df_region["avg_loan_amnt_by_state"] + '<br>' + 
                    "Avg employment length (Years): " + df_region["avg_emp_length_by_state"]

But I got these error message

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U33') dtype('<U33') dtype('<U33')

I used to have all columns as numeric(int64 or float64) except addr_state. Later, I changed all column types to object but still didn't get any luck. Can someone give hint what I miss here?

Head part of my dataset look like this

     amnt       num     avg_loan   emp_length   addr_state
1   36978050    2205    16770       6.00            AK
2   164627650   11200   14698       6.15            AL
3   93416075    6640    14068       5.90            AR
4   290110100   20412   14212       5.37            AZ
5   1898145250  129517  14655       5.66            CA

Thanks!

Upvotes: 1

Views: 128

Answers (2)

fodma1
fodma1

Reputation: 3535

String construction with addition is considered bad practice. Try this instead:

template = '''{addr_state}<br>
Total loan amount ($ USD): {loan_amnt}<br>
Avg loan amount ($ USD): {avg_loan_amnt_by_state}<br>
Avg employment length (Years): {avg_emp_length_by_state}'''

df_region["text"] = template.format(
    addr_state=df_region['addr_state'],
    loan_amnt=df_region['loan_amnt'],
    avg_loan_amnt_by_state=df_region['avg_loan_amnt_by_state'],
    avg_emp_length_by_state=df_region['avg_emp_length_by_state']
)

Or if you are using python 3.6 you can omit the format call, and directly include the variables in the template:

template = f'''{df_region['addr_state']}<br>
Total loan amount ($ USD): {df_region['loan_amnt']}<br>
Avg loan amount ($ USD): {df_region['avg_loan_amnt_by_state']}<br>
Avg employment length (Years): {df_region['avg_emp_length_by_state']}'''

The best thing about the format calls is that it calls __str__ under the hood: you don't have to care about conversion unless your data can't be represented as a string. You also have control over the decimal places and various formatting tools. Eg.:reference

Upvotes: 1

jezrael
jezrael

Reputation: 862441

I think simpliest is convert all numeric columns to str first:

c = ["loan_amnt", "avg_loan_amnt_by_state", "avg_emp_length_by_state"]
df_region[c] = df_region[c].astype(str)

Or convert each column separately:

df_region["text"] = df_region["addr_state"] + '<br>' + 
                    "Total loan amount ($ USD): " + df_region["loan_amnt"].astype(str) + "<br>" + 
                    "Avg loan amount ($ USD): " + df_region["avg_loan_amnt_by_state"].astype(str) + '<br>' + 
                    "Avg employment length (Years): " + df_region["avg_emp_length_by_state"].astype(str)

Upvotes: 2

Related Questions