Reputation: 723
I have a dataframe that looks like this:
**date** **time** **price**
NaN 1103 5
NaN 0010 10
NaN 0100 6
NaN 0201 8
NaN 0015 7
12.03.2020 0400 4
NaN 0500 6
NaN 0800 3
NaN 1000 4
NaN 1140 5
12.03.2020 1200 2
NaN 0030 1
NaN 0112 0
I want to fill the NaN values in the date column by calculating the date through the time field. My idea is to locate the first non empty value in the date column. Once I have that I want to bfill() and ffill() in such a way that dates are changed at a time measurement before and after midnight. So in our example the data will look like this after bfill() from the first non empty cell:
**date** **time** **price**
10.03.2020 1103 5
11.03.2020 0010 10
11.03.2020 0100 6
11.03.2020 0201 8
12.03.2020 0015 7
12.03.2020 0400 4
NaN 0500 6
NaN 0800 3
NaN 1000 4
NaN 1140 5
12.03.2020 1200 2
NaN 0030 1
NaN 0112 0
This is how it will look after ffil() from the first non empty date value:
**date** **time** **price**
10.03.2020 1103 5
11.03.2020 0010 10
11.03.2020 0100 6
11.03.2020 0201 8
12.03.2020 0015 7
12.03.2020 0400 4
12.03.2020 0500 6
12.03.2020 0800 3
12.03.2020 1000 4
12.03.2020 1140 5
12.03.2020 1200 2
13.03.2020 0030 1
13.03.2020 0112 0
It is worth noting that the first non empty value can appear anywhere in the date column. Would appreciate some help in solving this problem. Thanks
date = [NaN,NaN,NaN,NaN,NaN,"12.03.2020", NaN,NaN,NaN,NaN, "12.03.2020", NaN, NaN}
time = [1103,0010,0100,0201,0015,0400,0500,0800,1000,1140,1200,0030,0112]
price= [5,10,6,8,7,4,6,3,4,5,2,1,0]
Edit:Lists added
Upvotes: 3
Views: 203
Reputation: 1598
give this a try. I am not familiar with your date format and assume they are in text format
currDate=[]
def ffill(row,direction=1):
global currDate
if not currDate and pd.isna(row['date']): #could be check if date is blank string
return None
if not currDate:
dt=[int(x) for x in reversed(row['date'].split('.'))]
dt= datetime.datetime(*dt)
currDate=[dt, row['time']]
return dt
if row['time']*direction>currDate[1]*direction:
currDate[1]=row['time']
return currDate[0]
else:
currDate=[currDate[0]+relativedelta(days=direction),row['time']]
return currDate[0]
def bfill(row):
return ffill(row,-1)
df['ffill']=df.apply(ffill,axis=1)
df.sort_index(0,None,False,True)
currDate=[]
df['bfill']=df.apply(bfill,axis=1)
df.sort_index(0,None,True,True)
df
Upvotes: 1