user3655574
user3655574

Reputation: 723

python pandas exported csv format different from imported issue

I have a strange issue on pandas.read_csv function. I exported my dataframe into a csv, but when I re-imported the same csv, the data that has been imported back does not work when I try to merge(The merge shows all the data on the left and none that I have tried to merge it with). If I use the original data before it was exported to the csv, it works completely fine.(The merge was perfect).

df = df.values_list('id','teacher_id','uniquecount','nonuniquecount','msgcount','ordercount','date','updated','timestamp', flat=False)
#inserting the collected data into a dateframe for manipulation
df = pd.DataFrame(list(df))
#giving the dataframe column names
df.columns = ['id','teacher_id','uniquecount','nonuniquecount','msgcount','ordercount','date','updated','timestamp']
df = df[['id','teacher_id','uniquecount','nonuniquecount','msgcount','ordercount','date']]
#rename required columns
df.rename(columns={'uniquecount':'Unique Views','nonuniquecount':'Views','msgcount':'Messages','ordercount':'Orders'}, inplace=True)

print df
print df.dtypes
# exporting df out to a csv
# df.to_csv('test.csv', header=True)
# importing the df back from a csv
df = pd.read_csv('test.csv', index_col=0)
print df
print df.dtypes

#insert dates
numdays = 14
base = datetime.datetime.today().date()
date_list = [base - datetime.timedelta(days=x) for x in range(0, numdays)]
dates = pd.DataFrame(date_list)
dates.columns = ['date']

#merge the complete dates with the dateframe
df = pd.merge(dates ,df , on=['date'] , how='left')
# print df

I have checked and compared that the dataframes look exactly the same before export and after importing from the csv.(I printed the output twice, once before export and one after) I have also checked and the datetypes are all the same.

I need to export the csv to work with an external environment because I cant attach my local database.

attached a copy of the cmdline print which shows that both dataframes are exactly similar

cmd line interface image

attached below is a sample of my exported csv

,id,teacher_id,Unique Views,Views,Messages,Orders,date
0,47,31,1,6,0,0,2017-05-09
1,56,31,1,9,0,0,2017-05-10
2,67,31,2,11,0,0,2017-05-14
3,71,31,3,15,0,0,2017-05-15
4,79,31,3,17,0,0,2017-06-12
5,83,31,3,18,0,1,2017-06-18

Does anyone have any idea on this strange issue?

Upvotes: 1

Views: 203

Answers (1)

niraj
niraj

Reputation: 18208

Before calling merge, try converting both dates using to_datetime first as referred in answer here

df.date = pd.to_datetime(df.date)
dates.date = pd.to_datetime(dates.date)

#merge the complete dates with the dateframe
df = pd.merge(dates ,df , on=['date'] , how='left')

Upvotes: 1

Related Questions