Reputation: 13103
I wish to retain the first value of a variable by group. The code below does this, but uses for-loops
and seems overly complex. Is there a more efficient way, particularly in base R
? The object desired.result
contains my desired result.
my.data <- read.table(text = '
my.string my.cov my.id
11....... 1 1
1.1...... 3 2
..1.2.... 4 2
....2.2.. 5 2
12....... 2 3
.22...... 3 3
..24..... 3 3
1...2.... 1 4
....2...4 0 4
..2..4... 5 5
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
desired.result <- read.table(text = '
my.string my.cov my.id
11....... 1 1
1.1...... 3 2
..1.2.... 3 2
....2.2.. 3 2
12....... 2 3
.22...... 2 3
..24..... 2 3
1...2.... 1 4
....2...4 1 4
..2..4... 5 5
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
my.seq <- rle(my.data$my.id)$lengths
my.data$first <- unlist(lapply(my.seq, function(x) seq(1,x)))
my.data$last <- unlist(lapply(my.seq, function(x) seq(x,1,-1)))
my.data$my.new.cov <- rep(NA, nrow(my.data))
for(i in 1:nrow(my.data)) {
if(my.data$first[i] == 1) my.data$my.new.cov[i] = my.data$my.cov[i]
if(my.data$first[i] > 1) my.data$my.new.cov[i] = my.data$my.new.cov[(i - 1)]
}
my.data$my.cov <- my.data$my.new.cov
my.data <- my.data[, c('my.string', 'my.cov', 'my.id')]
all.equal(my.data, desired.result)
# [1] TRUE
Upvotes: 2
Views: 368
Reputation: 887048
We can use data.table
library(data.table)
setDT(my.data)[, my.cov := my.cov[1L], by = my.id]
my.data
# my.string my.cov my.id
# 1: 11....... 1 1
# 2: 1.1...... 3 2
# 3: ..1.2.... 3 2
# 4: ....2.2.. 3 2
# 5: 12....... 2 3
# 6: .22...... 2 3
# 7: ..24..... 2 3
# 8: 1...2.... 1 4
# 9: ....2...4 1 4
#10: ..2..4... 5 5
NOTE: The base R
solution (split
) posted will give incorrect results in some caes if it is not sorted.
Upvotes: 3
Reputation: 214937
Here is a base R solution:
do.call(rbind, lapply(split(my.data, my.data$my.id),
function(group) {
group$my.cov = group$my.cov[1]; group }))
my.string my.cov my.id
1 11....... 1 1
2.2 1.1...... 3 2
2.3 ..1.2.... 3 2
2.4 ....2.2.. 3 2
3.5 12....... 2 3
3.6 .22...... 2 3
3.7 ..24..... 2 3
4.8 1...2.... 1 4
4.9 ....2...4 1 4
5 ..2..4... 5 5
Upvotes: 1
Reputation: 13103
This seems to do it:
my.data$my.cov <- ave(my.data$my.cov, my.data$my.id, FUN = function(x) head(x,1))
Upvotes: 3