Reputation: 508
I have sampling data per depth in a dataframe. Now the orginal depth has to be interpolated to a new depth, for which I have information in a different dataframe. So what needs to happen is:
The first data.frame looks like this:
dat1 = data.frame(Samp = rep(letters[1:4], each = 4),
sam_top = seq(1:4),
sam_bot = seq(2,5),
value = runif(16),
stringsAsFactors = F)
gives me:
Samp sam_top sam_bot value
1 a 1 2 0.75864637
2 a 2 3 0.44056410
3 a 3 4 0.18105700
4 a 4 5 0.67938119
5 b 1 2 0.75444497
6 b 2 3 0.59411439
7 b 3 4 0.11755459
8 b 4 5 0.70143005
9 c 1 2 0.44234565
10 c 2 3 0.82872824
and a second dataframe with new depths for the first one look like this:
dat2 <- read.table(text = "Samp sam_new_bot
a 6
b 7
c 6
d 7.5", header = TRUE)
> dat2
Samp sam_new_bot
1 a 6.0
2 b 7.0
3 c 6.0
4 d 7.5
And the result should look like this:
Samp sam_top sam_bot value
1 a 1 2 0.75864637
2 a 2 3 0.44056410
3 a 3 4 0.18105700
4 a 4 5 0.67938119
5 a 5 6 0.67938119
5 b 1 2 0.75444497
6 b 2 3 0.59411439
7 b 3 4 0.11755459
8 b 4 5 0.70143005
9 b 5 7 0.70143005
10 c 1 2 0.44234565
Note that the data.frame has no fixed intervals, so there could be any number of Samp after each other.
Update: I manage to insert the rows with:
do.call(rbind, by(dat1, dat1$Samp, rbind, dat1[1,]))
Now my understanding is, that I have to replace the dat1[1] part with a function of x, but somehow I don't get things together.
Upvotes: 0
Views: 702
Reputation: 59375
Using your definitions of dat1
and dat2
:
dat3 <- with(dat2,data.frame(Samp, sam_top=NA, sam_bot=sam_new_bot, value=NA))
dat3 <- dat3[dat3$Samp %in% dat1$Samp,] # only use Samp in dat1
dat3$sam_top <- aggregate(sam_bot~Samp,dat1,tail,1)$sam_bot
dat3$value <- aggregate(value~Samp,dat1,tail,1)$value
final <- rbind(dat1,dat3)
final <- final[order(final$Samp,final$sam_top),]
Produces:
final
# Samp sam_top sam_bot value
# 1 a 1 2 0.7586464
# 2 a 2 3 0.4405641
# 3 a 3 4 0.1810570
# 4 a 4 5 0.6793812
# 11 a 5 6 0.6793812
# 5 b 1 2 0.7544450
# 6 b 2 3 0.5941144
# 7 b 3 4 0.1175546
# 8 b 4 5 0.7014301
# 12 b 5 7 0.7014301
# 9 c 1 2 0.4423456
# 10 c 2 3 0.8287282
# 13 c 3 6 0.8287282
The row names are different, but hopefully that is not important.
This code creates a dataframe dat3
with a template for the extra rows based on your dat2
, then sets sam_top
and value
in the template based on your criteria, then adds those rows to your original dat1
, to create final
, then reorders final to match your result. Note the use of the tail(...)
function in aggregate(...)
to extract the last row in a series.
Finally, your dat2
has a Samp="d"
, whereas your dat1
does not; this is the reason for the second line of code.
Upvotes: 1
Reputation: 52647
Here is a solution with data.table
. Logic is as follows:
sam_top
for every Samp
, as well as the corresponding valuedat2
to get the new sam_bot
for each of the new sam_top
dat1
The answer doesn't look exactly the same as your as the data is different (I used your data creation code though):
library(data.table)
dat1 <- data.table(dat1)
dat1.sub <- dat1[ # get new sam_top, as well as last value for Samp
order(sam_top),
list(sam_top=max(sam_bot), value=tail(value, 1L)),
by=Samp
]
dat2.sub <- merge(dat1.sub, dat2, by="Samp")[, c(1, 2, 4, 3), with=F] # match to `dat2`
rbind(dat1, unname(as.list(dat2.sub)))[order(Samp, sam_top)] # now rbind back and order
# Samp sam_top sam_bot value
# 1: a 1 2 0.37475446
# 2: a 2 3 0.87263241
# 3: a 3 4 0.53154291
# 4: a 4 5 0.41648329
# 5: a 5 6 0.41648329
# 6: b 1 2 0.36251215
# 7: b 2 3 0.33083137
# 8: b 3 4 0.03237498
# 9: b 4 5 0.75553453
# 10: b 5 7 0.75553453
# 11: c 1 2 0.98819386
# 12: c 2 3 0.95251107
# 13: c 3 4 0.35680588
# 14: c 4 5 0.34674393
# 15: c 5 6 0.34674393
# 16: d 1 2 0.19573338
# 17: d 2 3 0.59690127
# 18: d 3 4 0.87038993
# 19: d 4 5 0.88751762
# 20: d 5 7 0.88751762
Upvotes: 1