Reputation: 49
I would like to standardize my dataframe making it start and end at a precise date but i can't find the solution... I am dealing with a timeseries so it is crucial I have everything starting and ending on the same day.
I have tried several piece of code including code from stackvoerflow but nothing works.
Right now I just want rows that are between 01/01/2010 and 31/12/2017 this is the code I have so far:
df=pd.read_csv("AREX.csv", sep = ";")
df[~df['Date'].isin(pd.date_range(start='20100101', end='20171231'))]
print(df)
df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)
print(df)
But it does not affect the number of rows it only drops the column I ask it to.
Does anyone has any idea on how to fix this?
Thank you in advance for any piece of advice you can give me!
Upvotes: 2
Views: 52
Reputation: 49
Ok so thanks to @RafaelC here is the answer to my problem.
def concatenate(indir="../Equity_Merton", outfile = "../Merged.csv"):
os.chdir(indir)
fileList = glob.glob("*.csv")
ticker = []
main_df = pd.DataFrame()
for filename in fileList:
print(filename)
df=pd.read_csv(filename, sep = ";")
ticker.append(df)
df["Date"] = pd.to_datetime(df['Date'])
df = df[(df.Date <= '2017-12-31') & (df.Date >= '2010-01-01')]
df.set_index("Date", inplace=True)
df.rename(columns = {"Close": filename[0:len(filename) - 4]}, inplace = True)
df.drop(["Open","High","Low","Volume","Open interest"],axis = 1, inplace=True)
if main_df.empty:
main_df = df
else:
main_df = main_df.join(df, how='outer')
# main_df = main_df.dropna(axis = 0, how="any")
main_df.sort_index(axis=0, level=None, ascending=False, inplace=True, kind='quicksort', na_position='last')
print(main_df.head())
main_df.to_csv('Merton_Merged.csv')
shutil.move("Merton_Merged.csv", "../Merton_Merged.csv")
Thank you for your help!!
Upvotes: 1