Reputation: 221
i have already two datasets each one has 2 columns (date, close) i want to compare date of the first dataset to the date of the second dataset if they are the same date the close of the second dataset takes the value relative to the date in question else it takes the value of the date of previous day.
This is the dataset https://www.euronext.com/fr/products/equities/FR0000120644-XPAR https://fr.finance.yahoo.com/quote/%5EFCHI/history?period1=852105600&period2=1528873200&interval=1d&filter=history&frequency=1d
This is my code:
import numpy as np
from datetime import datetime , timedelta
import pandas as pd
#import cac 40 stock index (dataset1)
df = pd.read_csv('cac 40.csv')
df = pd.DataFrame(df)
#import Danone index(dataset2)
df1 = pd.read_excel('Price_Data_Danone.xlsx',header=3)
df1 = pd.DataFrame(df1)
#check the number of observation of both datasets and get the minimum number
if len(df1)>len(df):
size=len(df)
elif len(df1)<len(df):
size=len(df1)
else:
size=len(df)
#get new close values of dataset2 relative to the date in datset1
close1=np.zeros((size))
for i in range(0,size,1):
# find the date of dataset1 in dataset 2
if (df['Date'][i]in df1['Date']):
#get the index of the date and the corresponding value of close and store it in close1
close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
else:
#if the date doesen't exist in datset2
#take value of close of previous date of datatset1
close1[i]=df['close'][df1.loc['Date'][i-1], df['Date']]
This is my trail, i got this error : KeyError: 'the label [Date] is not in the [index]' Examples:
we look for the value df['Date'][1]
= '5/06/2009' in the column df1['Date']
we get its index in df1['Date']
then close1=df1['close'][index]
else if df['Date'][1]
= '5/06/2009' not in df1['Date']
we get the index of the previous date df['Date'][0]
= '4/06/2009'
close1=df1['close'][previous index]
Upvotes: 0
Views: 158
Reputation: 720
Your error happens in line:
close1[i]=df['close'][df1.loc['Date'][i], df['Date']]
If your goal here is to get close
value from df
given i
index you should write:
close[i] = df['close'][i]
See if that helps, unfortunately I don't understand fully what you are trying to accomplish, for example why do you set size
to the length of shorter table?
Also, as long as I downloaded correct files, your condition df['Date'][i]in df1['Date']
might not work, one date format uses -
and the other \
.
Solution
import pandas as pd
pd.set_option('expand_frame_repr', False)
# load both files
df = pd.read_csv('CAC.csv')
df1 = pd.read_csv('DANONE.csv', header=3)
# ensure date format is the same between two
df.Date = pd.to_datetime(df.Date, dayfirst=True)
df1.Date = pd.to_datetime(df1.Date, dayfirst=True)
# you need only Date and Close columns as far as I understand
keep_columns = ['Date', 'Close']
# let's keep only these columns then
df = df[keep_columns]
df1 = df1[keep_columns]
# merge two tables on Date, method is left so that for every row in df we
# 'append' row from df1 if possible, if not there will be NaN value,
# for readability I added suffixes df - CAC and df1 - DANONE
merged = pd.merge(df,
df1,
on='Date',
how='left',
suffixes=['CAC', 'DANONE'])
# now for all missing values in CloseDANONE, so if there is Date in df
# but not in df1 we fill this value with LAST available
merged.CloseDANONE.fillna(method='ffill', inplace=True)
# we get values from CloseDANONE column as long as it's not null
close1 = merged.loc[merged.CloseDANONE.notnull(), 'CloseDANONE'].values
Below you can see: last 6 values from df - CAC
Date Close
5522 2018-06-06 5457.560059
5523 2018-06-07 5448.359863
5524 2018-06-08 5450.220215
5525 2018-06-11 5473.910156
5526 2018-06-12 5453.370117
5527 2018-06-13 5468.240234
last 6 values from df1 - DANONE:
Date Close
0 2018-06-06 63.86
1 2018-06-07 63.71
2 2018-06-08 64.31
3 2018-06-11 64.91
4 2018-06-12 65.43
last 6 rows from merged:
Date CloseCAC CloseDANONE
5522 2018-06-06 5457.560059 63.86
5523 2018-06-07 5448.359863 63.71
5524 2018-06-08 5450.220215 64.31
5525 2018-06-11 5473.910156 64.91
5526 2018-06-12 5453.370117 65.43
5527 2018-06-13 5468.240234 65.43
For every value that was present in df
we get value from df1
, but 2018-06-13
is not present in df1
so I fill it with last available value which is 65.43
from 2018-06-12
.
Upvotes: 1