Luisa
Luisa

Reputation: 43

How can I convert string to numpy.array inside a DataFrame column?

My DataFrame db is built from a csv file, using read_csv. Values of column A look like this:

[1,2,5,6,48,125]

On every row, the "vector" can have a different length. But it is still a string. I can strip the [ and ] as follows:

db["A"] = db["A"].str.rstrip(']').str.lstrip('[')

The resulting values, such as 1,2,5,6,48,125, should be good input for np.fromstring. However, I am not able to apply this function in combination with pandas DataFrame.

When I try: db["A"] = np.fromstring(db["A"], sep=','), it says: a bytes-like object is required, not 'Series'. Using apply also does not work. Thanks for any tips.

Upvotes: 0

Views: 2138

Answers (3)

AvidJoe
AvidJoe

Reputation: 746

np.fromarray() is built for this purpose like you(OP) already pointed out. The problem here is that the input isn't being recognized as a string.

However this addresses the problem,

import pandas as pd
import numpy as np

dataframe = pd.DataFrame({'data': ["[1,2,4]", "[1,2,4,5]","[1,2,4,5,6]"]})
dataframe['data'] = dataframe['data'].apply(lambda x : np.fromstring(str(x).replace('[','').replace(']',''), sep=','))

The output will be an 1D- nparray

Running dataframe.head() gives me this

    data
0   [1.0, 2.0, 4.0]
1   [1.0, 2.0, 4.0, 5.0]
2   [1.0, 2.0, 4.0, 5.0, 6.0]

Upvotes: 0

Hue
Hue

Reputation: 59

import numpy as np
for i in range(0, len(db)-1):
  db["A"] = np.array(db.iloc[i]["A"])
  continue

Upvotes: 0

jezrael
jezrael

Reputation: 862406

One idea is convert values to lists and then to np.array:

import ast

db["A"] = db["A"].apply(lambda x: np.array(ast.literal_eval(x)))

Upvotes: 2

Related Questions