Gajanan Arha
Gajanan Arha

Reputation: 55

Compare two columns of Unequal Length in R using logical Operator

I am dealing with a big time series dataset, And I would like to compare two columns So my first Column looks like

            timeperiod          timefortreatment
       2014-08-01 00:00:00        102.81818
       2014-08-01 01:00:00         12.34483
       2014-08-01 02:00:00         35.67568
       2014-08-01 03:00:00        125.57692
       2014-08-01 04:00:00         97.56250
       2014-08-01 05:00:00         36.66667

And the second column looks like

        arrivaltime         
       2014-08-01 00:14:00       
       2014-08-01 00:22:00         
       2014-08-01 00:47:00         
       2014-08-01 01:07:00        
       2014-08-01 01:19:00         
       2014-08-01 01:53:00 

Both are of unequal lengths with second being larger than first. I have to compare the first column with second to get a final one which looks like below. The logic for comparison is that if arrival time in the second column is less than the entry in first column (time being 1 hour here) it gets the value of time of treatment for that specific period

             arrival          timefortreatment
       2014-08-01 00:14:00        102.81818
       2014-08-01 00:22:00        102.81818
       2014-08-01 00:47:00        102.81818
       2014-08-01 01:07:00         12.34483
       2014-08-01 01:19:00         12.34483
       2014-08-01 01:53:00         12.34483

I have made a logic based on two for loops and it is taking forever for 50k + values:

for (i in 1:nrow(date)) 
{
    for (j in 1:nrow(period))
    { 
        if (date[i,1]>=period[j,])
        { 
            z[i,]=t[j,] 
            j=j+1
        } 
    } 
    i=i+1 
}

I was wondering is there any other way in which this can be done. Any help in this regard will be highly appreciated. Editing my answer to accommodate for the cases with different time period.

             timeperiod                  timefortreatment
              2014-08-01 00:14:00               75
              2014-08-01 00:19:00              143
              2014-08-01 00:44:00              126
              2014-08-01 01:04:00              125
              2014-08-01 01:19:00              125
              2014-08-01 01:49:00              122

For this case, output will be as shown below based on same logic i.e. (arrival>=time period)

              arrival          timefortreatment
       2014-08-01 00:14:00            75
       2014-08-01 00:22:00           143
       2014-08-01 00:47:00           126
       2014-08-01 01:07:00           125
       2014-08-01 01:19:00           125
       2014-08-01 01:53:00           122 

Let me know if more details needed

Upvotes: 0

Views: 362

Answers (1)

bVa
bVa

Reputation: 3938

Here is a solution, with only one for loop, faster solution exists.

df1 = data.frame(timeperiod = seq(as.POSIXct("2014-08-01 00:00:00"), as.POSIXct("2014-08-01 05:00:00"), by = "1 hour"),
            timefortreatment = c(102.81818, 12.34483, 35.67568, 125.57692, 97.56250, 36.66667))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))


library(stringr)
df2$time_min = as.POSIXct(paste0(str_sub(df2$arrivaltime, 1, 14), "00:00"))

for (i in 1:nrow(df2))
{
 df2$timefortreatment[i] = df1$timefortreatment[df1$timeperiod == df2$time_min[i]]
}

EDIT

With no periodicity in timeperiod, you can use difftime function :

df1 = data.frame(timeperiod = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:19:00"), as.POSIXct("2014-08-01 00:44:00"), as.POSIXct("2014-08-01 01:04:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:49:00")), timefortreatment = c(75, 143, 126, 125, 125, 122))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))

for (i in 1:nrow(df2))
{
  df2$timefortreatment[i] = df1$timefortreatment[which.min(abs(difftime(df2$arrivaltime[i], df1$timeperiod)))]
}


# APPLY solution 

my_function = function(value)
{
  output = df1$timefortreatment[which.min(abs(difftime(value, df1$timeperiod)))]
}
df2$timefortreatment = apply(df2, 1, my_function)


> df2
          arrivaltime timefortreatment
1 2014-08-01 00:14:00               75
2 2014-08-01 00:22:00              143
3 2014-08-01 00:47:00              126
4 2014-08-01 01:07:00              125
5 2014-08-01 01:19:00              125
6 2014-08-01 01:53:00              122

Upvotes: 1

Related Questions