cheesus
cheesus

Reputation: 1181

Pandas series changed through CSV export import

I need to save pandas series and make sure that, once loaded again, they are exactly the same. However, they are not. I tried to manipulate the result in various ways but cannot find a solution. This is my MWE:

import pandas as pd

idx = pd.date_range(start='2010', periods=100, freq='1M')

ts = pd.Series(data=range(100), index=idx)

ts.to_csv(f'test.csv')
imported_ts= pd.read_csv('test.csv', delimiter=',', index_col=None)

print(ts.equals(imported_ts))

>>> False

What am I doing wrong?

Upvotes: 0

Views: 52

Answers (4)

adir abargil
adir abargil

Reputation: 5745

what happening is read_csv by default is looking for a dataframe even if it is a single column, in addition due to the lack of csv typing, it could possibly be more difficult then my suggestio. i that case see @Serge Ballesta's answer

if its a simple case, try to convert the result :

print(ts.equals(imported_ts.iloc[:,0]))

Upvotes: 1

cheesus
cheesus

Reputation: 1181

I resolved this issue by using pickle instead.

import pandas as pd

idx = pd.date_range(start='2010', periods=100, freq='1M')

ts = pd.Series(data=range(100), index=idx)

ts.to_pickle("./test.pkl")

unpickled_df = pd.read_pickle("./test.pkl")

print(ts.equals(unpickled_df))


>>> True

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148910

You cannot. A pandas Series contains an index and a data column, both having a type (the dtype), a (possibly complex) title which itself has a type, and values.

A CSV file is just a text file which contains text representations of values and optionaly the text representation of the title in first row. Nothing more. When things are simple, meaning if the titles are simple strings, and all values are integers or small decimal (*), the save-load round trip will give you exactly what you initially had.

But if you have more complex use cases, for example date types, or object dtype columns containing decimal.Decimal values, the generated CSV file will only contain a textual representation with no type information. So it is impossible to make sure of the original dtype by reading the content of a csv file, the reason why the read_csv method has so many options.


(*) by small decimal I mean a small number of digits after the decimal point.

Upvotes: 3

Prateek
Prateek

Reputation: 369

You are saving the dates as index and comparing with the values of your df. Do this instead..

import pandas as pd

idx = pd.date_range(start='2010', periods=100, freq='1M')

ts = pd.Series(data=range(100), index=idx)

ts.to_csv(f'test.csv')
imported_ts= pd.read_csv('test.csv', delimiter=',', index_col=['Unnamed: 0'])

print(ts.index.equals(imported_ts.index))

Gives

True

Upvotes: 0

Related Questions