Reputation: 157
I have 2 dataframes with different structures. The first one contains data from a continuos and repeated analysis over few samples (multiple rows with time and value for each single measurement), the second one reports the sample ID and the start and finish time of the measurement.
##example
df.analysis <- data.frame(var= rnorm(321,mean=50),
time= seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"), by= 90))
df.sample <- data.frame(sample= rep_len(1:8, 30),
start=seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 7:45:0","%Y-%m-%d %H:%M:%S"),length.out=30),
end=seq(strptime("2018-1-1 0:15:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"),length.out=30))
I should insert the sample ID corresponding to each measured value, having in mind that not all the measurements corrispond to a sample. I tried with the following code but it doesn't work because now it compares the rows from the first database with the corresponding rows from the second database. While I need that every single row from the first database to be compared with all the rows from the second database
if df.analysis$time >df.sample[,"start"] & df.analysis$time < df.sample[,"end"] {
df.analysis$sample <- df.sample$sample
}
I thought to use a for
loop or a lapply
but I can't make work them properly.
Upvotes: 1
Views: 60
Reputation: 20095
One option using sqldf
package can be achieved by having a inner join
and then a left outer join
as:
library(sqldf)
sqldf("select analysis.*, matchedSample.sample from
'df.analysis' analysis
left outer join
(select sample.sample, analysis.time
from 'df.sample' sample,'df.analysis' analysis
where analysis.time > sample.start
and analysis.time < sample.end) matchedSample on
analysis.time = matchedSample.time")
# var time sample
# 1 49.41763 2018-01-01 00:00:00 NA
# 2 50.20399 2018-01-01 00:01:30 1
# 3 48.80242 2018-01-01 00:03:00 1
# 4 50.56982 2018-01-01 00:04:30 1
# 5 50.08948 2018-01-01 00:06:00 1
# 6 50.32223 2018-01-01 00:07:30 1
# 7 49.60842 2018-01-01 00:09:00 1
# 8 50.82316 2018-01-01 00:10:30 1
# ....
# .... 313 more rows
Upvotes: 2
Reputation: 887851
We can use a non-equi join
library(data.table)
setDT(df.analysis)[df.sample, sample := sample, on = .(time > start, time <end)]
Upvotes: 2