Reputation: 770
I looked here and elsewhere, but I cannot find something that does exactly what I'm looking to accomplish using R.
I have data similar to below, where col1 is a unique ID, col2 is a group ID variable, col3 is a status code. I need to flag all rows with the same group ID, and where any of those rows have a specific status code, X in this case, as == 1, otherwise 0.
ID GroupID Status Flag
1 100 A 1
2 100 X 1
3 102 A 0
4 102 B 0
5 103 B 1
6 103 X 1
7 104 X 1
8 104 X 1
9 105 A 0
10 105 C 0
I have tried writing some ifelse where groupID == groupID and status == X then 1 else 0, but that doesn't work. The pattern of Status is random. In this example, the GroupID is exclusively pairs, but I don't want to assume that in the code, b/c I have other instance where there are 3 or more rows in a GroupID.
It would be helpful if this were open ended IE I could add other conditions if necessary, like, for each matching group ID, where Status == X, and other or other, etc.
Thank you !
Upvotes: 1
Views: 992
Reputation: 887891
A base R
option with rowsum
i1 <- with(df1, rowsum(+(Status == "X"), group = GroupID) > 0)
transform(df1, Flag = +(GroupID %in% row.names(i1)[i1]))
Or using table
df1$Flag <- +(with(df1, GroupID %in% names(which(table(GroupID,
Status == "X")[,2]> 0))))
Upvotes: 0
Reputation: 1101
An alternative using data.table
library(data.table)
dt <- read.table(stringsAsFactors = FALSE,text = "ID GroupID Status
1 100 A
2 100 X
3 102 A
4 102 B
5 103 B
6 103 X
7 104 X
8 104 X
9 105 A
10 105 C", header=T)
setDT(dt)[,.(ID,Status, Flag=ifelse("X"%in% Status,1,0)),by=GroupID]
#returns
GroupID ID Status Flag
1: 100 1 A 1
2: 100 2 X 1
3: 102 3 A 0
4: 102 4 B 0
5: 103 5 B 1
6: 103 6 X 1
7: 104 7 X 1
8: 104 8 X 1
9: 105 9 A 0
10: 105 10 C 0
Upvotes: 1
Reputation: 323376
From base R
ave(df$Status=='X',df$GroupID,FUN=any)
[1] TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE
Upvotes: 2
Reputation: 6073
Data.table way:
library(data.table)
setDT(df)
df[ , flag := sum(Status == "X") > 0, by=GroupID]
Upvotes: 1
Reputation: 11957
Group-based operations like this are easy to do with the dplyr package.
The data:
library(dplyr)
txt <- 'ID GroupID Status
1 100 A
2 100 X
3 102 A
4 102 B
5 103 B
6 103 X
7 104 X
8 104 X
9 105 A
10 105 C '
df <- read.table(text = txt, header = T)
Once we have the data frame, we establish dplyr groups with the group_by
function. The mutate command will then be applied per each group, creating a new column entry for each row.
df.new <- df %>%
group_by(GroupID) %>%
mutate(Flag = as.numeric(any(Status == 'X')))
# A tibble: 10 x 4
# Groups: GroupID [5]
ID GroupID Status Flag
<int> <int> <fct> <dbl>
1 1 100 A 1
2 2 100 X 1
3 3 102 A 0
4 4 102 B 0
5 5 103 B 1
6 6 103 X 1
7 7 104 X 1
8 8 104 X 1
9 9 105 A 0
10 10 105 C 0
Upvotes: 3