mace
mace

Reputation: 508

Add row at after certain intervals and copy data into it in R

I have sampling data per depth in a dataframe. Now the orginal depth has to be interpolated to a new depth, for which I have information in a different dataframe. So what needs to happen is:

The first data.frame looks like this:

dat1 = data.frame(Samp = rep(letters[1:4], each = 4), 
             sam_top = seq(1:4), 
             sam_bot = seq(2,5), 
             value = runif(16), 
             stringsAsFactors = F) 

gives me:

   Samp sam_top sam_bot      value
1     a       1       2 0.75864637
2     a       2       3 0.44056410
3     a       3       4 0.18105700
4     a       4       5 0.67938119
5     b       1       2 0.75444497
6     b       2       3 0.59411439
7     b       3       4 0.11755459
8     b       4       5 0.70143005
9     c       1       2 0.44234565
10    c       2       3 0.82872824

and a second dataframe with new depths for the first one look like this:

dat2 <- read.table(text = "Samp sam_new_bot
a      6
b      7
c      6
d      7.5", header = TRUE)

> dat2
  Samp   sam_new_bot
1    a         6.0
2    b         7.0
3    c         6.0
4    d         7.5

And the result should look like this:

   Samp sam_top sam_bot      value
1     a       1       2 0.75864637
2     a       2       3 0.44056410
3     a       3       4 0.18105700
4     a       4       5 0.67938119
5     a       5       6 0.67938119
5     b       1       2 0.75444497
6     b       2       3 0.59411439
7     b       3       4 0.11755459
8     b       4       5 0.70143005
9     b       5       7 0.70143005
10    c       1       2 0.44234565

Note that the data.frame has no fixed intervals, so there could be any number of Samp after each other.

Update: I manage to insert the rows with:

do.call(rbind, by(dat1, dat1$Samp, rbind, dat1[1,]))

Now my understanding is, that I have to replace the dat1[1] part with a function of x, but somehow I don't get things together.

Upvotes: 0

Views: 702

Answers (2)

jlhoward
jlhoward

Reputation: 59375

Using your definitions of dat1 and dat2:

dat3 <- with(dat2,data.frame(Samp, sam_top=NA, sam_bot=sam_new_bot, value=NA))
dat3         <- dat3[dat3$Samp %in% dat1$Samp,]        # only use Samp in dat1
dat3$sam_top <- aggregate(sam_bot~Samp,dat1,tail,1)$sam_bot
dat3$value   <- aggregate(value~Samp,dat1,tail,1)$value
final        <- rbind(dat1,dat3)
final        <- final[order(final$Samp,final$sam_top),]

Produces:

final
#    Samp sam_top sam_bot     value
# 1     a       1       2 0.7586464
# 2     a       2       3 0.4405641
# 3     a       3       4 0.1810570
# 4     a       4       5 0.6793812
# 11    a       5       6 0.6793812
# 5     b       1       2 0.7544450
# 6     b       2       3 0.5941144
# 7     b       3       4 0.1175546
# 8     b       4       5 0.7014301
# 12    b       5       7 0.7014301
# 9     c       1       2 0.4423456
# 10    c       2       3 0.8287282
# 13    c       3       6 0.8287282

The row names are different, but hopefully that is not important.

This code creates a dataframe dat3 with a template for the extra rows based on your dat2, then sets sam_top and value in the template based on your criteria, then adds those rows to your original dat1, to create final, then reorders final to match your result. Note the use of the tail(...) function in aggregate(...) to extract the last row in a series.

Finally, your dat2 has a Samp="d", whereas your dat1 does not; this is the reason for the second line of code.

Upvotes: 1

BrodieG
BrodieG

Reputation: 52647

Here is a solution with data.table. Logic is as follows:

  1. get the max sam_top for every Samp, as well as the corresponding value
  2. merge this to dat2 to get the new sam_bot for each of the new sam_top
  3. rbind back to dat1

The answer doesn't look exactly the same as your as the data is different (I used your data creation code though):

library(data.table)
dat1 <- data.table(dat1)
dat1.sub <- dat1[  # get new sam_top, as well as last value for Samp
  order(sam_top), 
  list(sam_top=max(sam_bot), value=tail(value, 1L)), 
  by=Samp
]
dat2.sub <- merge(dat1.sub, dat2, by="Samp")[, c(1, 2, 4, 3), with=F] # match to `dat2`
rbind(dat1, unname(as.list(dat2.sub)))[order(Samp, sam_top)]  # now rbind back and order
#     Samp sam_top sam_bot      value
# 1:     a       1       2 0.37475446
# 2:     a       2       3 0.87263241
# 3:     a       3       4 0.53154291
# 4:     a       4       5 0.41648329
# 5:     a       5       6 0.41648329
# 6:     b       1       2 0.36251215
# 7:     b       2       3 0.33083137
# 8:     b       3       4 0.03237498
# 9:     b       4       5 0.75553453
# 10:    b       5       7 0.75553453
# 11:    c       1       2 0.98819386
# 12:    c       2       3 0.95251107
# 13:    c       3       4 0.35680588
# 14:    c       4       5 0.34674393
# 15:    c       5       6 0.34674393
# 16:    d       1       2 0.19573338
# 17:    d       2       3 0.59690127
# 18:    d       3       4 0.87038993
# 19:    d       4       5 0.88751762
# 20:    d       5       7 0.88751762

Upvotes: 1

Related Questions