Reputation: 55
I am dealing with a big time series dataset, And I would like to compare two columns So my first Column looks like
timeperiod timefortreatment
2014-08-01 00:00:00 102.81818
2014-08-01 01:00:00 12.34483
2014-08-01 02:00:00 35.67568
2014-08-01 03:00:00 125.57692
2014-08-01 04:00:00 97.56250
2014-08-01 05:00:00 36.66667
And the second column looks like
arrivaltime
2014-08-01 00:14:00
2014-08-01 00:22:00
2014-08-01 00:47:00
2014-08-01 01:07:00
2014-08-01 01:19:00
2014-08-01 01:53:00
Both are of unequal lengths with second being larger than first. I have to compare the first column with second to get a final one which looks like below. The logic for comparison is that if arrival time in the second column is less than the entry in first column (time being 1 hour here) it gets the value of time of treatment for that specific period
arrival timefortreatment
2014-08-01 00:14:00 102.81818
2014-08-01 00:22:00 102.81818
2014-08-01 00:47:00 102.81818
2014-08-01 01:07:00 12.34483
2014-08-01 01:19:00 12.34483
2014-08-01 01:53:00 12.34483
I have made a logic based on two for
loops and it is taking forever for 50k + values:
for (i in 1:nrow(date))
{
for (j in 1:nrow(period))
{
if (date[i,1]>=period[j,])
{
z[i,]=t[j,]
j=j+1
}
}
i=i+1
}
I was wondering is there any other way in which this can be done. Any help in this regard will be highly appreciated. Editing my answer to accommodate for the cases with different time period.
timeperiod timefortreatment
2014-08-01 00:14:00 75
2014-08-01 00:19:00 143
2014-08-01 00:44:00 126
2014-08-01 01:04:00 125
2014-08-01 01:19:00 125
2014-08-01 01:49:00 122
For this case, output will be as shown below based on same logic i.e. (arrival>=time period)
arrival timefortreatment
2014-08-01 00:14:00 75
2014-08-01 00:22:00 143
2014-08-01 00:47:00 126
2014-08-01 01:07:00 125
2014-08-01 01:19:00 125
2014-08-01 01:53:00 122
Let me know if more details needed
Upvotes: 0
Views: 362
Reputation: 3938
Here is a solution, with only one for
loop, faster solution exists.
df1 = data.frame(timeperiod = seq(as.POSIXct("2014-08-01 00:00:00"), as.POSIXct("2014-08-01 05:00:00"), by = "1 hour"),
timefortreatment = c(102.81818, 12.34483, 35.67568, 125.57692, 97.56250, 36.66667))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))
library(stringr)
df2$time_min = as.POSIXct(paste0(str_sub(df2$arrivaltime, 1, 14), "00:00"))
for (i in 1:nrow(df2))
{
df2$timefortreatment[i] = df1$timefortreatment[df1$timeperiod == df2$time_min[i]]
}
EDIT
With no periodicity in timeperiod, you can use difftime
function :
df1 = data.frame(timeperiod = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:19:00"), as.POSIXct("2014-08-01 00:44:00"), as.POSIXct("2014-08-01 01:04:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:49:00")), timefortreatment = c(75, 143, 126, 125, 125, 122))
df2 = data.frame(arrivaltime = c(as.POSIXct("2014-08-01 00:14:00"), as.POSIXct("2014-08-01 00:22:00"), as.POSIXct("2014-08-01 00:47:00"), as.POSIXct("2014-08-01 01:07:00"), as.POSIXct("2014-08-01 01:19:00"), as.POSIXct("2014-08-01 01:53:00")))
for (i in 1:nrow(df2))
{
df2$timefortreatment[i] = df1$timefortreatment[which.min(abs(difftime(df2$arrivaltime[i], df1$timeperiod)))]
}
# APPLY solution
my_function = function(value)
{
output = df1$timefortreatment[which.min(abs(difftime(value, df1$timeperiod)))]
}
df2$timefortreatment = apply(df2, 1, my_function)
> df2
arrivaltime timefortreatment
1 2014-08-01 00:14:00 75
2 2014-08-01 00:22:00 143
3 2014-08-01 00:47:00 126
4 2014-08-01 01:07:00 125
5 2014-08-01 01:19:00 125
6 2014-08-01 01:53:00 122
Upvotes: 1