Replacing string by numpy array in a pandas DataFrame

Question

I have a csv file that looks like this:

A, B
34, "1.0, 2.0"
24, "3.0, 4.0"

I'm reading the file using pandas:

import pandas as pd
df = pd.read_csv('file.csv')

What I need to do is to replace the strings by numpy arrays:

for index, row in df.iterrows():
        df['B'][index] = np.fromstring(df['B'][index], sep=',')

However, it raises the error A value is trying to be set on a copy of a slice from a DataFrame. However, the numpy arrays are being correctly created.

I need all value in B to be of type numpy.ndarray.

Edit: I tried replacing df by row in the code.

for index, row in df.iterrows():
    row['flux'] = np.fromstring(row['flux'][index][1:-1], sep=',')

And no error is raised, but the type of the variables doesn't change and the DataFrame still contains strings.

Bruno Mello · Accepted Answer

You can use apply to change to that format:

df['B'] = df['B'].apply(lambda x: np.fromstring(x, sep=','))

Replacing string by numpy array in a pandas DataFrame

Answers (2)

Related Questions