6659081
6659081

Reputation: 401

Replacing string by numpy array in a pandas DataFrame

I have a csv file that looks like this:

A, B
34, "1.0, 2.0"
24, "3.0, 4.0"

I'm reading the file using pandas:

import pandas as pd
df = pd.read_csv('file.csv')

What I need to do is to replace the strings by numpy arrays:

for index, row in df.iterrows():
        df['B'][index] = np.fromstring(df['B'][index], sep=',')

However, it raises the error A value is trying to be set on a copy of a slice from a DataFrame. However, the numpy arrays are being correctly created.

I need all value in B to be of type numpy.ndarray.

Edit: I tried replacing df by row in the code.

for index, row in df.iterrows():
    row['flux'] = np.fromstring(row['flux'][index][1:-1], sep=',')

And no error is raised, but the type of the variables doesn't change and the DataFrame still contains strings.

Upvotes: 1

Views: 1397

Answers (2)

jezrael
jezrael

Reputation: 862641

Use converters parameter in read_csv for convert to numpy array:

import pandas as pd
import numpy as np
from io import StringIO

temp='''A,B
34,"1.0, 2.0"
24,"3.0, 4.0"'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), converters={'B':lambda x: np.fromstring(x, sep=',')})

print (df)
    A           B
0  34  [1.0, 2.0]
1  24  [3.0, 4.0]

Upvotes: 2

Bruno Mello
Bruno Mello

Reputation: 4618

You can use apply to change to that format:

df['B'] = df['B'].apply(lambda x: np.fromstring(x, sep=','))

Upvotes: 1

Related Questions