Azee.
Azee.

Reputation: 723

Calculate values in date column based on time column

I have a dataframe that looks like this:

**date**              **time**            **price**
NaN                      1103                   5  
NaN                      0010                   10
NaN                      0100                   6
NaN                      0201                   8
NaN                      0015                   7
12.03.2020               0400                   4
NaN                      0500                   6
NaN                      0800                   3
NaN                      1000                   4
NaN                      1140                   5
12.03.2020               1200                   2
NaN                      0030                   1
NaN                      0112                   0

I want to fill the NaN values in the date column by calculating the date through the time field. My idea is to locate the first non empty value in the date column. Once I have that I want to bfill() and ffill() in such a way that dates are changed at a time measurement before and after midnight. So in our example the data will look like this after bfill() from the first non empty cell:

     **date**              **time**            **price**
    10.03.2020               1103                   5  
    11.03.2020               0010                   10
    11.03.2020               0100                   6
    11.03.2020               0201                   8
    12.03.2020               0015                   7
    12.03.2020               0400                   4
    NaN                      0500                   6
    NaN                      0800                   3
    NaN                      1000                   4
    NaN                      1140                   5
    12.03.2020               1200                   2
    NaN                      0030                   1
    NaN                      0112                   0

This is how it will look after ffil() from the first non empty date value:

          **date**             **time**            **price**
        10.03.2020               1103                   5  
        11.03.2020               0010                   10
        11.03.2020               0100                   6
        11.03.2020               0201                   8
        12.03.2020               0015                   7
        12.03.2020               0400                   4
        12.03.2020               0500                   6
        12.03.2020               0800                   3
        12.03.2020               1000                   4
        12.03.2020               1140                   5
        12.03.2020               1200                   2
        13.03.2020               0030                   1
        13.03.2020               0112                   0

It is worth noting that the first non empty value can appear anywhere in the date column. Would appreciate some help in solving this problem. Thanks

date = [NaN,NaN,NaN,NaN,NaN,"12.03.2020", NaN,NaN,NaN,NaN, "12.03.2020", NaN, NaN}
time = [1103,0010,0100,0201,0015,0400,0500,0800,1000,1140,1200,0030,0112]
price= [5,10,6,8,7,4,6,3,4,5,2,1,0]

Edit:Lists added

Upvotes: 3

Views: 203

Answers (1)

Bing Wang
Bing Wang

Reputation: 1598

give this a try. I am not familiar with your date format and assume they are in text format

currDate=[]

def ffill(row,direction=1):
    global currDate
    if not currDate and pd.isna(row['date']): #could be check if date is blank string
        return None
    if not currDate:
        dt=[int(x) for x in reversed(row['date'].split('.'))]
        dt= datetime.datetime(*dt)
        currDate=[dt, row['time']]
        return dt
    if row['time']*direction>currDate[1]*direction:
        currDate[1]=row['time']
        return currDate[0]
    else:
        currDate=[currDate[0]+relativedelta(days=direction),row['time']]
        return currDate[0]
def bfill(row):
    return ffill(row,-1)
df['ffill']=df.apply(ffill,axis=1)
df.sort_index(0,None,False,True)
currDate=[]
df['bfill']=df.apply(bfill,axis=1)
df.sort_index(0,None,True,True)
df

Upvotes: 1

Related Questions