user42485
user42485

Reputation: 811

Conditional data frame variable based on presence on other row

I have a data frame like:

df <- data.frame(Group = c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'b'),
                 ID = c(paste0('x', c('1', '2', '2', '3', '4', '5', '6', '6')))

I want to assign a third variable, newvar, that looks like below:

df <- data.frame(Group = c('a', 'a', 'b', 'a', 'b', 'b', 'a', 'b'),
                 ID = c(paste0('x', c('1', '2', '2', '3', '4', '5', '6', '6'))),
                 newvar = c('first', 'first', 'second', 'first', 'first', 'first', 'first', 'second'))

For every ID, it may appear either once or twice. If it appears in Group a, the row containing a will be assigned 'first'. If it appears and Group a and b, then a will be assigned 'first' and b will be assigned 'second'. If it only appears in b and not a, newvar will be assigned 'first'. How can I write code to assign newvar as such?

Upvotes: 0

Views: 34

Answers (1)

s_baldur
s_baldur

Reputation: 33498

What about this solution with data.table:

library(data.table)
setDT(df)
df[, newvar := c('first', 'second')[seq_len(.N)], by = .(ID)]
df
   Group ID newvar
1:     a x1  first
2:     a x2  first
3:     b x2 second
4:     a x3  first
5:     b x4  first
6:     b x5  first
7:     a x6  first
8:     b x6 second

Upvotes: 1

Related Questions