user1472131
user1472131

Reputation: 421

panda dataframe: persisting a list of numbers and then reading it

I have a dataframe that contains a list of numbers (positive and negative numbers). I persist the dataframe to a csv, and when I read it the list of numbers is a string. And it's difficult to convert it back to a list: python complains about the square brackets and the minus sign. Is there a way of persisting lists of numbers and reading them back as list of numbers?

data = [['tom', [10,-5,3]], ['dave', [15,-1,4]], ['al', [14,-1,-1]]] 
df1 = pd.DataFrame(data, columns = ['Name', 'Points']) 
df1.to_csv("points.csv")
df2 = pd.read_csv("points.csv")

The points column in df2 is a string. How to converti it to a list of numbers?

Upvotes: 0

Views: 553

Answers (2)

smci
smci

Reputation: 33938

Don't store your data as a Python list inside a pandas dataframe, that's going to be a pain to write out as CSV and read back, the types will get mangled (unless you use pickle, or JSON, which you can, but why unnecessarily create complications?).

Easier to simply store as a native pandas dataframe:

df3 = pd.DataFrame({'tom': [10,-5,3], 'dave': [15,-1,4], 'al': [14,-1,-1]})

df3
   tom  dave  al
0   10    15  14
1   -5    -1  -1
2    3     4  -1

df3.to_csv('my.csv', index=False)

# Now when we read it back in, the integer columns remain integer...
df3in = pd.read_csv('my.csv')

   tom  dave  al
0   10    15  14
1   -5    -1  -1
2    3     4  -1

df3.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
tom     3 non-null int64
dave    3 non-null int64
al      3 non-null int64
dtypes: int64(3)
memory usage: 152.0 bytes

Upvotes: 1

jezrael
jezrael

Reputation: 862671

You can use pickle here with DataFrame.to_pickle and read_pickle, because csv data are always strings:

data = [['tom', [10,-5,3]], ['dave', [15,-1,4]], ['al', [14,-1,-1]]] 
df1 = pd.DataFrame(data, columns = ['Name', 'Points']) 

df1.to_pickle("points.pkl")
df2 = pd.read_pickle("points.pkl")
print (type(df2.loc[0, 'Points']))
<class 'list'>

Upvotes: 2

Related Questions