Reputation: 3
I have a big data frame where I want to subset data.
I made an exemplary table: Name is an unique ID, V1 summarizes all points ARx and V2 is the value i want to built a subset with.
I want summarize all IDx (V1), if one V2 is > 0. In my example table, I would select for selection <- df$V1[which(df$V2>0),]
, what would give my a vector of all V1 IDs where V2 > 0, obviously.
Name | V1 | V2 |
---|---|---|
AR1.1 | ID1 | 0 |
AR1.2 | ID1 | 0 |
AR2.1 | ID2 | 0 |
AR2.2 | ID2 | 1 |
AR3.1 | ID3 | 0 |
AR3.2 | ID3 | 1 |
AR3.3 | ID3 | 0 |
AR4.1 | ID4 | 2 |
AR4.2 | ID4 | 0 |
Now comes my problem: I want to transfer the V1 IDs to the whole data frame, to select all rows from the selected IDs in selection, regardless of the V2 value. Meaning, I want a sub data frame, in which all unique values of V1 are, if one of the same IDx has V2 > 0.
In my example table this would be:
Name | V1 | V2 |
---|---|---|
AR2.1 | ID2 | 0 |
AR2.2 | ID2 | 1 |
AR3.1 | ID3 | 0 |
AR3.2 | ID3 | 1 |
AR3.3 | ID3 | 0 |
AR4.1 | ID4 | 2 |
AR4.2 | ID4 | 0 |
How I can transfer my selection vector to the whole data frame (maybe with matching IDx names?). I tried with which
again, and %in%
, but I only created a smaller subset and couldn't transfer all of the V2=0 V1. Is there maybe a better way to start off instead of with which
?
Upvotes: 0
Views: 291
Reputation: 39657
You can use %in%
:
selection <- df$V1[df$V2>0]
#selection <- df$V1[which(df$V2>0)] #Alternative
#selection <- unique(selection) #Optional
df[df$V1 %in% selection,]
# Name V1 V2
#3 AR2.1 ID2 0
#4 AR2.2 ID2 1
#5 AR3.1 ID3 0
#6 AR3.2 ID3 1
#7 AR3.3 ID3 0
#8 AR4.1 ID4 2
#9 AR4.2 ID4 0
Data:
df <- data.frame(Name = c("AR1.1", "AR1.2", "AR2.1", "AR2.2", "AR3.1", "AR3.2", "AR3.3", "AR4.1", "AR4.2")
, V1 = c("ID1", "ID1", "ID2", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4")
, V2 = c(0, 0, 0, 1, 0, 1, 0, 2, 0))
Upvotes: 0
Reputation: 2626
You could do
your_data |>
split(~ V1) |>
rlist::list.filter(any(V2 > 0)) |>
dplyr::bind_rows()
returning
# A tibble: 7 x 3
Name V1 V2
<chr> <chr> <int>
1 AR2.1 ID2 0
2 AR2.2 ID2 1
3 AR3.1 ID3 0
4 AR3.2 ID3 1
5 AR3.3 ID3 0
6 AR4.1 ID4 2
7 AR4.2 ID4 0
(Data used:)
your_data <- structure(list(Name = c("AR1.1", "AR1.2", "AR2.1", "AR2.2", "AR3.1", "AR3.2", "AR3.3", "AR4.1", "AR4.2"), V1 = c("ID1", "ID1", "ID2", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4"), V2 = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 2L, 0L)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0