Reputation: 55
I have a dataframe like this :
dte | res | year |
---|---|---|
1995-01-01 | 65.3 | 1995 |
1995-01-02 | 65.5 | 1995 |
... | ... | ... |
2019-01-03 | 55.2 | 2019 |
2019-01-04 | 52.2 | 2019 |
and I'm trying to create another file in this format :
| year-1995 | res | year-1996 | res | year-1997 | res | year-1998 | res |
|----------- |------|------------|------|------------|------|------------|------|
| 1995-01-01 | 65.3 | 1996-01-01 | 70.5 | 1997-01-01 | 70.5 | 1998-01-01 | 70.5 |
| 1995-01-02 | 65.5 | 1996-01-02 | 68.3 | 1997-01-02 | 68.3 | 1998-01-02 | 68.3 |
| 1995-01-03 | 61.8 | 1996-01-03 | 65.2 | 1997-01-03 | 65.2 | 1998-01-03 | 65.2 |
| 1995-01-04 | 55.2 | 1996-01-04 | 60.4 | 1997-01-04 | 60.4 | 1998-01-04 | 60.4 |
| 1995-01-05 | 52.2 | 1996-01-05 | 55.8 | 1997-01-05 | 55.8 | 1998-01-05 | 55.8 |
basically I want every year in a different column.
Here is what I already did :
import pandas as pd
import numpy as np
import openpyxl as op
import datetime as dt
def reader(file) :
return pd.read_csv(file,sep=";")
vierzon=reader("input/vierzon.csv")
output="output/writer.xlsx"
vierzon['dte']=pd.to_datetime(vierzon['dte'])
vierzon['date']=vierzon['dte'].dt.date
vierzon['year']=pd.DatetimeIndex(vierzon['date']).year
del vierzon['date']
currentYear=vierzon.iloc[0]['year']
myDict={}
count=0
for row in vierzon.itertuples():
if row[3]!=currentYear and count==0 :
df=pd.DataFrame(myDict,index=[row.dte])
df=df.transpose()
myDict={row.dte:row.res}
currentYear=row[3]
if row[3]!=currentYear and count!=0 :
df2=pd.DataFrame(myDict,index=[row.dte])
df2=df2.transpose()
df=df.append(df2)
myDict={row.dte:row.res}
currentYear=row[3]
myDict[row.dte]=row.res
print(df)
df.to_excel(output,sheet_name='vierzon')
when I write in my loop df=pd.DataFrame(myDict,index=[row.dte])
and df2=pd.DataFrame(myDict,index=[row.dte])
it doesn't work at all.
so i tried : df=pd.DataFrame(myDict,index=[output])
and df2=pd.DataFrame(myDict,index=[output])
but my years are still on the same column as my first table.
I already checked, df and df2 seems fine, so maybe my issue is in my .append ?
I know it's about the index, but i have no idea how to fix it, i don't understand well how it works. Can you help me ?
thank you.
Upvotes: 0
Views: 63
Reputation: 23099
set a custom index then use unstack
- we can then flatten the subsequent multi index.
#df['dte'] = pd.to_datetime(df['dte'],errors='coerce')
df1 = df.set_index([df.groupby('year').cumcount(),'year']).unstack(1)
df1 = df1.sort_index(1,level=1)
df1.columns = [f"{x}-{y}" for x,y in df1.columns]
dte-1995 res-1995 dte-2019 res-2019
0 1995-02-01 65.3 2019-05-01 55.2
1 1995-03-01 65.5 2019-06-01 52.2
Upvotes: 2