Daniel Botnik
Daniel Botnik

Reputation: 679

Save/Load numpy array as column in Pandas to csv file

I have the following DataFrame

index    word      decoded_Word                   language

0        potato    [17, 24,  1, 21,  1, 24]       english

1        animal    [21, 13, 23, 18, 21, 25]       english

2        שלום       ...                            hebrew

and I want to convert it to a csv file, i used the following line

df.to_csv('dataset.csv',encoding='utf8',index=False)

and get the following file

potato,[17 24  1 21  1 24],english
animals,[21 13 23 18 21 25  4],english
שלום,[21 12  6 24],hebrew

but when I execute the following code I get

data = pd.read_csv('dataset.csv')
print(type(data['decoded_word'][0]))

the result is str

I would like to know if there is better way to save/load the numpy array.

Thank you.

Upvotes: 2

Views: 4662

Answers (2)

esocrats
esocrats

Reputation: 198

That is normal, because pandas does not store the format of the columns in a csv file, and there is only so much it can infer.

To solve this simply, after loading the dataset (so after data = pd.read_csv('dataset.csv')) do:

data[decoded_word] = data[decoded_word].astype(list)

This will change the type of the column to list. You may be able to convert it to a numpy.ndarray as well.

An alternative, if this is possible for you, is to store the dataframe in another format, e.g., pickle:

data.to_pickle('dataset.pkl')

This should preserve the columns types.

Note: I see a comment indicating that you should use eval. This should work as well, but, as a rule, I prefer to never use eval for manipulating data unless it is the only way and you are very sure there is no security threat.

Upvotes: 2

Anurag Dabas
Anurag Dabas

Reputation: 24322

Before saving change the type of 'decoded_Word' from np.array to list then save it to csv:

df['decoded_Word']=df['decoded_Word'].map(list)
#Finally save that csv:
df.to_csv('dataset.csv',encoding='utf8',index=False)

Now load that file:

data = pd.read_csv('dataset.csv')
#Since the 'decoded_Word' is string so make it real list by:
data['decoded_Word']=pd.eval(data['decoded_Word'])
#(optional if you need array then):
data['decoded_Word']=data['decoded_Word'].map(np.array)

Upvotes: 1

Related Questions