Add row at after certain intervals and copy data into it in R

Question

I have sampling data per depth in a dataframe. Now the orginal depth has to be interpolated to a new depth, for which I have information in a different dataframe. So what needs to happen is:

create a new row after each sample type (i.e. a, b,)
copy "value" from above
take last sam_bot(bottom) as new sam_top
take dat2 sam_new_bot as new sam_bot

The first data.frame looks like this:

dat1 = data.frame(Samp = rep(letters[1:4], each = 4), 
             sam_top = seq(1:4), 
             sam_bot = seq(2,5), 
             value = runif(16), 
             stringsAsFactors = F)

gives me:

   Samp sam_top sam_bot      value
1     a       1       2 0.75864637
2     a       2       3 0.44056410
3     a       3       4 0.18105700
4     a       4       5 0.67938119
5     b       1       2 0.75444497
6     b       2       3 0.59411439
7     b       3       4 0.11755459
8     b       4       5 0.70143005
9     c       1       2 0.44234565
10    c       2       3 0.82872824

and a second dataframe with new depths for the first one look like this:

dat2 <- read.table(text = "Samp sam_new_bot
a      6
b      7
c      6
d      7.5", header = TRUE)

> dat2
  Samp   sam_new_bot
1    a         6.0
2    b         7.0
3    c         6.0
4    d         7.5

And the result should look like this:

   Samp sam_top sam_bot      value
1     a       1       2 0.75864637
2     a       2       3 0.44056410
3     a       3       4 0.18105700
4     a       4       5 0.67938119
5     a       5       6 0.67938119
5     b       1       2 0.75444497
6     b       2       3 0.59411439
7     b       3       4 0.11755459
8     b       4       5 0.70143005
9     b       5       7 0.70143005
10    c       1       2 0.44234565

Note that the data.frame has no fixed intervals, so there could be any number of Samp after each other.

Update: I manage to insert the rows with:

do.call(rbind, by(dat1, dat1$Samp, rbind, dat1[1,]))

Now my understanding is, that I have to replace the dat1[1] part with a function of x, but somehow I don't get things together.

jlhoward · Accepted Answer

Using your definitions of dat1 and dat2:

dat3 <- with(dat2,data.frame(Samp, sam_top=NA, sam_bot=sam_new_bot, value=NA))
dat3         <- dat3[dat3$Samp %in% dat1$Samp,]        # only use Samp in dat1
dat3$sam_top <- aggregate(sam_bot~Samp,dat1,tail,1)$sam_bot
dat3$value   <- aggregate(value~Samp,dat1,tail,1)$value
final        <- rbind(dat1,dat3)
final        <- final[order(final$Samp,final$sam_top),]

Produces:

final
#    Samp sam_top sam_bot     value
# 1     a       1       2 0.7586464
# 2     a       2       3 0.4405641
# 3     a       3       4 0.1810570
# 4     a       4       5 0.6793812
# 11    a       5       6 0.6793812
# 5     b       1       2 0.7544450
# 6     b       2       3 0.5941144
# 7     b       3       4 0.1175546
# 8     b       4       5 0.7014301
# 12    b       5       7 0.7014301
# 9     c       1       2 0.4423456
# 10    c       2       3 0.8287282
# 13    c       3       6 0.8287282

The row names are different, but hopefully that is not important.

This code creates a dataframe dat3 with a template for the extra rows based on your dat2, then sets sam_top and value in the template based on your criteria, then adds those rows to your original dat1, to create final, then reorders final to match your result. Note the use of the tail(...) function in aggregate(...) to extract the last row in a series.

Finally, your dat2 has a Samp="d", whereas your dat1 does not; this is the reason for the second line of code.

Add row at after certain intervals and copy data into it in R

Answers (2)

Related Questions