Reputation: 43
My DataFrame db is built from a csv file, using read_csv. Values of column A look like this:
[1,2,5,6,48,125]
On every row, the "vector" can have a different length. But it is still a string. I can strip the [ and ] as follows:
db["A"] = db["A"].str.rstrip(']').str.lstrip('[')
The resulting values, such as 1,2,5,6,48,125
, should be good input for np.fromstring. However, I am not able to apply this function in combination with pandas DataFrame.
When I try:
db["A"] = np.fromstring(db["A"], sep=',')
, it says:
a bytes-like object is required, not 'Series'.
Using apply
also does not work. Thanks for any tips.
Upvotes: 0
Views: 2138
Reputation: 746
np.fromarray()
is built for this purpose like you(OP) already pointed out. The problem here is that the input isn't being recognized as a string.
However this addresses the problem,
import pandas as pd
import numpy as np
dataframe = pd.DataFrame({'data': ["[1,2,4]", "[1,2,4,5]","[1,2,4,5,6]"]})
dataframe['data'] = dataframe['data'].apply(lambda x : np.fromstring(str(x).replace('[','').replace(']',''), sep=','))
The output will be an 1D- nparray
Running dataframe.head()
gives me this
data
0 [1.0, 2.0, 4.0]
1 [1.0, 2.0, 4.0, 5.0]
2 [1.0, 2.0, 4.0, 5.0, 6.0]
Upvotes: 0
Reputation: 59
import numpy as np
for i in range(0, len(db)-1):
db["A"] = np.array(db.iloc[i]["A"])
continue
Upvotes: 0
Reputation: 862406
One idea is convert values to lists and then to np.array
:
import ast
db["A"] = db["A"].apply(lambda x: np.array(ast.literal_eval(x)))
Upvotes: 2