Mark Miller
Mark Miller

Reputation: 13103

retain first value in group

I wish to retain the first value of a variable by group. The code below does this, but uses for-loops and seems overly complex. Is there a more efficient way, particularly in base R? The object desired.result contains my desired result.

my.data <- read.table(text = '

     my.string   my.cov  my.id
     11.......      1      1
     1.1......      3      2
     ..1.2....      4      2
     ....2.2..      5      2
     12.......      2      3
     .22......      3      3
     ..24.....      3      3
     1...2....      1      4
     ....2...4      0      4
     ..2..4...      5      5
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

desired.result <- read.table(text = '

     my.string   my.cov  my.id
     11.......      1      1
     1.1......      3      2
     ..1.2....      3      2
     ....2.2..      3      2
     12.......      2      3
     .22......      2      3
     ..24.....      2      3
     1...2....      1      4
     ....2...4      1      4
     ..2..4...      5      5
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

my.seq <- rle(my.data$my.id)$lengths
my.data$first <- unlist(lapply(my.seq, function(x) seq(1,x)))
my.data$last  <- unlist(lapply(my.seq, function(x) seq(x,1,-1)))

my.data$my.new.cov <- rep(NA, nrow(my.data))

for(i in 1:nrow(my.data)) {
    if(my.data$first[i] == 1) my.data$my.new.cov[i] = my.data$my.cov[i]
    if(my.data$first[i] >  1) my.data$my.new.cov[i] = my.data$my.new.cov[(i - 1)]
}

my.data$my.cov <- my.data$my.new.cov

my.data <- my.data[, c('my.string', 'my.cov', 'my.id')]

all.equal(my.data, desired.result)

# [1] TRUE

Upvotes: 2

Views: 368

Answers (3)

akrun
akrun

Reputation: 887048

We can use data.table

library(data.table)
setDT(my.data)[,  my.cov := my.cov[1L], by = my.id]
my.data
#    my.string my.cov my.id
# 1: 11.......      1     1
# 2: 1.1......      3     2
# 3: ..1.2....      3     2
# 4: ....2.2..      3     2
# 5: 12.......      2     3
# 6: .22......      2     3
# 7: ..24.....      2     3
# 8: 1...2....      1     4
# 9: ....2...4      1     4
#10: ..2..4...      5     5

NOTE: The base R solution (split) posted will give incorrect results in some caes if it is not sorted.

Upvotes: 3

akuiper
akuiper

Reputation: 214937

Here is a base R solution:

do.call(rbind, lapply(split(my.data, my.data$my.id), 
        function(group) {
            group$my.cov = group$my.cov[1]; group }))

    my.string my.cov my.id
1   11.......      1     1
2.2 1.1......      3     2
2.3 ..1.2....      3     2
2.4 ....2.2..      3     2
3.5 12.......      2     3
3.6 .22......      2     3
3.7 ..24.....      2     3
4.8 1...2....      1     4
4.9 ....2...4      1     4
5   ..2..4...      5     5

Upvotes: 1

Mark Miller
Mark Miller

Reputation: 13103

This seems to do it:

my.data$my.cov <- ave(my.data$my.cov, my.data$my.id, FUN = function(x) head(x,1))

Upvotes: 3

Related Questions