vineeth venugopal
vineeth venugopal

Reputation: 1106

numpy array changes to string when writing to file

I have a dataframe where one of the columns is a numpy array:

 DF

      Name                     Vec
 0  Abenakiite-(Ce) [0.0, 0.0, 0.0, 0.0, 0.0, 0.043, 0.0, 0.478, 0...
 1  Abernathyite    [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
 2  Abhurite        [0.176, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.235, 0...
 3  Abswurmbachite  [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.25, 0.0,...

When I check the data type of each element, the correct data type is returned.

 type(DF['Vec'].iloc[1])
 numpy.ndarray

I save this into a csv file:

DF.to_csv('.\\file.csv',sep='\t')

Now, when I read the file again,

new_DF=pd.read_csv('.\\file.csv',sep='\t')

and check the datatype of Vec at index 1:

type(new_DF['Vec'].iloc[1])   
str

The size of the numpy array is 1x127.

The data type has changed from a numpy array to a string. I can also see some new line elements in the individual vectors. I think this might be due to some problem when the vector is written into a csv but I don't know how to fix it. Can someone please help?

Thanks!

Upvotes: 4

Views: 3453

Answers (2)

coffeenino
coffeenino

Reputation: 11

The answer above works. If you get empty lists, add the list slicing [1:-1] !

This converts the string [-2.0797753, 3.6340227, -1.7011836]

to -2.0797753, 3.6340227, -1.7011836

which is the required format for np.fromstring https://numpy.org/doc/stable/reference/generated/numpy.fromstring.html

Upvotes: 0

anishtain4
anishtain4

Reputation: 2402

In the comments I made a mistake and said dtype instead of converters. What you want is to convert them as you read them using a function. With some dummy variables:

df=pd.DataFrame({'name':['name1','name2'],'Vec':[np.array([1,2]),np.array([3,4])]})
df.to_csv('tmp.csv')
def converter(instr):
    return np.fromstring(instr[1:-1],sep=' ')
df1=pd.read_csv('tmp.csv',converters={'Vec':converter})
df1.iloc[0,2]
array([1., 2.])

Upvotes: 7

Related Questions