Reputation: 1181
I need to save pandas series and make sure that, once loaded again, they are exactly the same. However, they are not. I tried to manipulate the result in various ways but cannot find a solution. This is my MWE:
import pandas as pd
idx = pd.date_range(start='2010', periods=100, freq='1M')
ts = pd.Series(data=range(100), index=idx)
ts.to_csv(f'test.csv')
imported_ts= pd.read_csv('test.csv', delimiter=',', index_col=None)
print(ts.equals(imported_ts))
>>> False
What am I doing wrong?
Upvotes: 0
Views: 52
Reputation: 5745
what happening is read_csv by default is looking for a dataframe even if it is a single column, in addition due to the lack of csv typing, it could possibly be more difficult then my suggestio. i that case see @Serge Ballesta's answer
if its a simple case, try to convert the result :
print(ts.equals(imported_ts.iloc[:,0]))
Upvotes: 1
Reputation: 1181
I resolved this issue by using pickle
instead.
import pandas as pd
idx = pd.date_range(start='2010', periods=100, freq='1M')
ts = pd.Series(data=range(100), index=idx)
ts.to_pickle("./test.pkl")
unpickled_df = pd.read_pickle("./test.pkl")
print(ts.equals(unpickled_df))
>>> True
Upvotes: 1
Reputation: 148910
You cannot. A pandas Series contains an index and a data column, both having a type (the dtype
), a (possibly complex) title which itself has a type, and values.
A CSV file is just a text file which contains text representations of values and optionaly the text representation of the title in first row. Nothing more. When things are simple, meaning if the titles are simple strings, and all values are integers or small decimal (*), the save-load round trip will give you exactly what you initially had.
But if you have more complex use cases, for example date types, or object dtype columns containing decimal.Decimal
values, the generated CSV file will only contain a textual representation with no type information. So it is impossible to make sure of the original dtype by reading the content of a csv file, the reason why the read_csv
method has so many options.
(*) by small decimal I mean a small number of digits after the decimal point.
Upvotes: 3
Reputation: 369
You are saving the dates as index and comparing with the values of your df. Do this instead..
import pandas as pd
idx = pd.date_range(start='2010', periods=100, freq='1M')
ts = pd.Series(data=range(100), index=idx)
ts.to_csv(f'test.csv')
imported_ts= pd.read_csv('test.csv', delimiter=',', index_col=['Unnamed: 0'])
print(ts.index.equals(imported_ts.index))
Gives
True
Upvotes: 0