Save/Load numpy array as column in Pandas to csv file

Question

I have the following DataFrame

index    word      decoded_Word                   language

0        potato    [17, 24,  1, 21,  1, 24]       english

1        animal    [21, 13, 23, 18, 21, 25]       english

2        שלום       ...                            hebrew

and I want to convert it to a csv file, i used the following line

df.to_csv('dataset.csv',encoding='utf8',index=False)

and get the following file

potato,[17 24  1 21  1 24],english
animals,[21 13 23 18 21 25  4],english
שלום,[21 12  6 24],hebrew

but when I execute the following code I get

data = pd.read_csv('dataset.csv')
print(type(data['decoded_word'][0]))

the result is str

I would like to know if there is better way to save/load the numpy array.

Thank you.

esocrats · Accepted Answer

That is normal, because pandas does not store the format of the columns in a csv file, and there is only so much it can infer.

To solve this simply, after loading the dataset (so after data = pd.read_csv('dataset.csv')) do:

data[decoded_word] = data[decoded_word].astype(list)

This will change the type of the column to list. You may be able to convert it to a numpy.ndarray as well.

An alternative, if this is possible for you, is to store the dataframe in another format, e.g., pickle:

data.to_pickle('dataset.pkl')

This should preserve the columns types.

Note: I see a comment indicating that you should use eval. This should work as well, but, as a rule, I prefer to never use eval for manipulating data unless it is the only way and you are very sure there is no security threat.

Save/Load numpy array as column in Pandas to csv file

Answers (2)

Related Questions