Rawia Sammout
Rawia Sammout

Reputation: 221

Date comparison in python

i have already two datasets each one has 2 columns (date, close) i want to compare date of the first dataset to the date of the second dataset if they are the same date the close of the second dataset takes the value relative to the date in question else it takes the value of the date of previous day.

This is the dataset https://www.euronext.com/fr/products/equities/FR0000120644-XPAR https://fr.finance.yahoo.com/quote/%5EFCHI/history?period1=852105600&period2=1528873200&interval=1d&filter=history&frequency=1d

This is my code:

import numpy as np
from datetime import datetime , timedelta
import pandas as pd
#import cac 40 stock index (dataset1)
df = pd.read_csv('cac 40.csv')
df = pd.DataFrame(df) 
#import Danone index(dataset2)
df1 = pd.read_excel('Price_Data_Danone.xlsx',header=3)
df1 = pd.DataFrame(df1) 
#check the number of observation of both datasets and get the minimum number
if len(df1)>len(df):
    size=len(df)
elif len(df1)<len(df):
     size=len(df1)
else:
     size=len(df)
#get new close values of dataset2 relative to the date in datset1
close1=np.zeros((size))
for i in range(0,size,1):
    # find the date of dataset1 in dataset 2
    if (df['Date'][i]in df1['Date']):
    #get the index of the date and the corresponding value of close and store it in close1
        close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
    else:
        #if the date doesen't exist in datset2
    #take value of close of previous date of datatset1
        close1[i]=df['close'][df1.loc['Date'][i-1], df['Date']]

This is my trail, i got this error : KeyError: 'the label [Date] is not in the [index]' Examples:

we look for the value df['Date'][1] = '5/06/2009' in the column df1['Date'] we get its index in df1['Date'] then close1=df1['close'][index] else if df['Date'][1] = '5/06/2009' not in df1['Date'] we get the index of the previous date df['Date'][0] = '4/06/2009' close1=df1['close'][previous index]

Upvotes: 0

Views: 158

Answers (1)

dylan_fan
dylan_fan

Reputation: 720

Your error happens in line:

close1[i]=df['close'][df1.loc['Date'][i], df['Date']]

If your goal here is to get close value from df given i index you should write:

close[i] = df['close'][i]

See if that helps, unfortunately I don't understand fully what you are trying to accomplish, for example why do you set size to the length of shorter table? Also, as long as I downloaded correct files, your condition df['Date'][i]in df1['Date'] might not work, one date format uses - and the other \.

Solution

import pandas as pd


pd.set_option('expand_frame_repr', False)

# load both files
df = pd.read_csv('CAC.csv')
df1 = pd.read_csv('DANONE.csv', header=3)

# ensure date format is the same between two
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)

# you need only Date and Close columns as far as I understand
keep_columns = ['Date', 'Close']

# let's keep only these columns then
df = df[keep_columns]
df1 = df1[keep_columns]

# merge two tables on Date, method is left so that for every row in df we 
# 'append' row from df1 if possible, if not there will be NaN value, 
# for readability I added suffixes df - CAC and df1 - DANONE
merged = pd.merge(df,
                  df1,
                  on='Date',
                  how='left',
                  suffixes=['CAC', 'DANONE'])

# now for all missing values in CloseDANONE, so if there is Date in df
# but not in df1 we fill this value with LAST available
merged.CloseDANONE.fillna(method='ffill', inplace=True)

# we get values from CloseDANONE column as long as it's not null
close1 = merged.loc[merged.CloseDANONE.notnull(), 'CloseDANONE'].values

Below you can see: last 6 values from df - CAC

           Date        Close
5522 2018-06-06  5457.560059
5523 2018-06-07  5448.359863
5524 2018-06-08  5450.220215
5525 2018-06-11  5473.910156
5526 2018-06-12  5453.370117
5527 2018-06-13  5468.240234

last 6 values from df1 - DANONE:

        Date  Close
0 2018-06-06  63.86
1 2018-06-07  63.71
2 2018-06-08  64.31
3 2018-06-11  64.91
4 2018-06-12  65.43

last 6 rows from merged:

           Date     CloseCAC  CloseDANONE
5522 2018-06-06  5457.560059        63.86
5523 2018-06-07  5448.359863        63.71
5524 2018-06-08  5450.220215        64.31
5525 2018-06-11  5473.910156        64.91
5526 2018-06-12  5453.370117        65.43
5527 2018-06-13  5468.240234        65.43

For every value that was present in df we get value from df1, but 2018-06-13 is not present in df1 so I fill it with last available value which is 65.43 from 2018-06-12.

Upvotes: 1

Related Questions