Reputation: 770

flag rows in groups with multiple conditions

I looked here and elsewhere, but I cannot find something that does exactly what I'm looking to accomplish using R.

I have data similar to below, where col1 is a unique ID, col2 is a group ID variable, col3 is a status code. I need to flag all rows with the same group ID, and where any of those rows have a specific status code, X in this case, as == 1, otherwise 0.

ID   GroupID    Status   Flag
 1    100        A        1
 2    100        X        1
 3    102        A        0
 4    102        B        0
 5    103        B        1 
 6    103        X        1
 7    104        X        1
 8    104        X        1
 9    105        A        0
 10   105        C        0

I have tried writing some ifelse where groupID == groupID and status == X then 1 else 0, but that doesn't work. The pattern of Status is random. In this example, the GroupID is exclusively pairs, but I don't want to assume that in the code, b/c I have other instance where there are 3 or more rows in a GroupID.

It would be helpful if this were open ended IE I could add other conditions if necessary, like, for each matching group ID, where Status == X, and other or other, etc.

Thank you !

Upvotes: 1

Answers (5)

akrun

Reputation: 887891

A base R option with rowsum

i1 <- with(df1, rowsum(+(Status == "X"), group = GroupID) > 0)
transform(df1, Flag = +(GroupID %in% row.names(i1)[i1]))

Or using table

df1$Flag <- +(with(df1, GroupID %in% names(which(table(GroupID,
                  Status == "X")[,2]> 0))))

Upvotes: 0

Chriss Paul

Reputation: 1101

An alternative using data.table

library(data.table)
dt <- read.table(stringsAsFactors = FALSE,text = "ID   GroupID    Status
                 1    100        A
                 2    100        X
                 3    102        A
                 4    102        B
                 5    103        B 
                 6    103        X
                 7    104        X
                 8    104        X
                 9    105        A
                 10   105        C", header=T)
setDT(dt)[,.(ID,Status, Flag=ifelse("X"%in% Status,1,0)),by=GroupID]

#returns
   GroupID ID Status Flag
 1:     100  1      A    1
 2:     100  2      X    1
 3:     102  3      A    0
 4:     102  4      B    0
 5:     103  5      B    1
 6:     103  6      X    1
 7:     104  7      X    1
 8:     104  8      X    1
 9:     105  9      A    0
10:     105 10      C    0

Upvotes: 1

BENY

Reputation: 323376

From base R

ave(df$Status=='X',df$GroupID,FUN=any)
 [1]  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE  TRUE FALSE FALSE

Upvotes: 2

DanY

Reputation: 6073

Data.table way:

library(data.table)
setDT(df)

df[ , flag := sum(Status == "X") > 0, by=GroupID]

Upvotes: 1

jdobres

Reputation: 11957

Group-based operations like this are easy to do with the dplyr package.

The data:

library(dplyr)

txt <- 'ID   GroupID    Status
1    100        A        
2    100        X        
3    102        A        
4    102        B        
5    103        B         
6    103        X        
7    104        X        
8    104        X        
9    105        A        
10   105        C        '

df <- read.table(text = txt, header = T)

Once we have the data frame, we establish dplyr groups with the group_by function. The mutate command will then be applied per each group, creating a new column entry for each row.

df.new <- df %>% 
  group_by(GroupID) %>% 
  mutate(Flag = as.numeric(any(Status == 'X')))

# A tibble: 10 x 4
# Groups:   GroupID [5]
      ID GroupID Status  Flag
   <int>   <int> <fct>  <dbl>
 1     1     100 A          1
 2     2     100 X          1
 3     3     102 A          0
 4     4     102 B          0
 5     5     103 B          1
 6     6     103 X          1
 7     7     104 X          1
 8     8     104 X          1
 9     9     105 A          0
10    10     105 C          0

Upvotes: 3

flag rows in groups with multiple conditions

Answers (5)

Related Questions