Mike Henderson
Mike Henderson

Reputation: 2142

Change multiple columns in a DataFrame

I am a beginner in Python and made my first venture into Pandas today. What I want to do is to convert several columns from string to float. Here's a quick example:

import numpy as np
import pandas as pd

def convert(str):
    try:
        return float(str.replace(',', ''))
    except:
        return None

df = pd.DataFrame([
    ['A', '1,234', '456,789'],
    ['B', '1'    , '---'    ]
], columns=['Company Name', 'X', 'Y'])

I want to convert X and Y to float. The reality has more columns and I don't always know the column names for X and Y so I must use integer indexing.

This works:

df.iloc[:, 1] = df.iloc[:, 1].apply(convert)
df.iloc[:, 2] = df.iloc[:, 2].apply(convert)

This doesn't:

df.iloc[:, 1:2] = df.iloc[:, 1:2].apply(convert)
# Error: could not broadcast input array from shape (2) into shape (2,1)

Is there anyway to apply the convert function on multiple columns at once?

Upvotes: 1

Views: 1301

Answers (1)

jpp
jpp

Reputation: 164843

There are several issues with your logic:

  1. The slice 1:2 excludes 2, consistent with list slicing or slice object syntax. Use 1:3 instead.
  2. Applying an element-wise function to a series via pd.Series.apply works. To apply an element-wise function to a dataframe, you need pd.DataFrame.applymap.
  3. Never shadow built-ins: use mystr or x instead of str as a variable or argument name.
  4. When you use a try / except construct, you should generally specify error type(s), in this case ValueError.

Therefore, this is one solution:

def convert(x):
    try:
        return float(x.replace(',', ''))
    except ValueError:
        return None

df.iloc[:, 1:3] = df.iloc[:, 1:3].applymap(convert)

print(df)

  Company Name     X       Y
0            A  1234  456789
1            B     1     NaN

However, your logic is inefficient: you should look to leverage column-wise operations wherever possible. This can be achieved via pd.DataFrame.apply, along with pd.to_numeric applied to each series:

def convert_series(x):
    return pd.to_numeric(x.str.replace(',', ''), errors='coerce')

df.iloc[:, 1:3] = df.iloc[:, 1:3].apply(convert_series)

print(df)

  Company Name     X       Y
0            A  1234  456789
1            B     1     NaN

Upvotes: 1

Related Questions