Chan
Chan

Reputation: 4301

How to get numpy data dtype when reading csv?

I have a dataframe containing numpy array.

I saved it to a csv file.

After loading the csv file, I found that the column containing the numpy array has dtype string.

How to convert it to numpy array using read_csv?

import pandas as pd
import numpy as np

df = pd.DataFrame(columns = ['name', 'sex'])
df.loc[len(df), :] = ['Sam', 'M']
df.loc[len(df), :] = ['Mary', 'F']
df.loc[len(df), :] = ['Ann', 'F']

#insert np.array
df['data'] = ''
df['data'][0] = np.array([2,5,7])
df['data'][1] = np.array([6,4,8])
df['data'][2] = np.array([9,2,1])

#save to csv file
df.to_csv('data.csv', index =False)
#load csv file
df2 = pd.read_csv('data.csv')#data column becomes string, how to change it to np.array?

Upvotes: 3

Views: 2533

Answers (2)

shivsn
shivsn

Reputation: 7848

Its a workaround:

In [114]: df2['data'] = df2.data.str.split(' ',expand=True).replace('\[|\]','',regex=True).astype(int).values.tolist()

In [115]: df2['data'] = [np.array(i) for i in df2.data]

In [116]: df2.loc[0,'data']
Out[116]: array([2, 5, 7])

Upvotes: 1

Sreekiran A R
Sreekiran A R

Reputation: 3421

Pandas has only 7 datatypes: Object, float, int, bool, datetime, timedelta and category. So list, string, array etc. is treated as object datatype only. You can read more about it in http://pbpython.com/pandas_dtypes.html You can use astype function to convert between these datatypes only.

Upvotes: 0

Related Questions