Sophia
Sophia

Reputation: 3

Create subset data frame with matching names of a smaller vector subset in R

I have a big data frame where I want to subset data.

I made an exemplary table: Name is an unique ID, V1 summarizes all points ARx and V2 is the value i want to built a subset with. I want summarize all IDx (V1), if one V2 is > 0. In my example table, I would select for selection <- df$V1[which(df$V2>0),], what would give my a vector of all V1 IDs where V2 > 0, obviously.

Name V1 V2
AR1.1 ID1 0
AR1.2 ID1 0
AR2.1 ID2 0
AR2.2 ID2 1
AR3.1 ID3 0
AR3.2 ID3 1
AR3.3 ID3 0
AR4.1 ID4 2
AR4.2 ID4 0

Now comes my problem: I want to transfer the V1 IDs to the whole data frame, to select all rows from the selected IDs in selection, regardless of the V2 value. Meaning, I want a sub data frame, in which all unique values of V1 are, if one of the same IDx has V2 > 0.

In my example table this would be:

Name V1 V2
AR2.1 ID2 0
AR2.2 ID2 1
AR3.1 ID3 0
AR3.2 ID3 1
AR3.3 ID3 0
AR4.1 ID4 2
AR4.2 ID4 0

How I can transfer my selection vector to the whole data frame (maybe with matching IDx names?). I tried with which again, and %in%, but I only created a smaller subset and couldn't transfer all of the V2=0 V1. Is there maybe a better way to start off instead of with which?

Upvotes: 0

Views: 291

Answers (2)

GKi
GKi

Reputation: 39657

You can use %in%:

selection <- df$V1[df$V2>0]
#selection <- df$V1[which(df$V2>0)] #Alternative
#selection <- unique(selection)     #Optional
df[df$V1 %in% selection,]
#   Name  V1 V2
#3 AR2.1 ID2  0
#4 AR2.2 ID2  1
#5 AR3.1 ID3  0
#6 AR3.2 ID3  1
#7 AR3.3 ID3  0
#8 AR4.1 ID4  2
#9 AR4.2 ID4  0

Data:

df <- data.frame(Name = c("AR1.1", "AR1.2", "AR2.1", "AR2.2", "AR3.1", "AR3.2", "AR3.3", "AR4.1", "AR4.2")
 , V1 = c("ID1", "ID1", "ID2", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4")
 , V2 = c(0, 0, 0, 1, 0, 1, 0, 2, 0))   

Upvotes: 0

ktiu
ktiu

Reputation: 2626

You could do

your_data |>
  split(~ V1) |>
  rlist::list.filter(any(V2 > 0)) |>
  dplyr::bind_rows()

returning

# A tibble: 7 x 3
  Name  V1       V2
  <chr> <chr> <int>
1 AR2.1 ID2       0
2 AR2.2 ID2       1
3 AR3.1 ID3       0
4 AR3.2 ID3       1
5 AR3.3 ID3       0
6 AR4.1 ID4       2
7 AR4.2 ID4       0

(Data used:)

your_data <- structure(list(Name = c("AR1.1", "AR1.2", "AR2.1", "AR2.2", "AR3.1", "AR3.2", "AR3.3", "AR4.1", "AR4.2"), V1 = c("ID1", "ID1", "ID2", "ID2", "ID3", "ID3", "ID3", "ID4", "ID4"), V2 = c(0L, 0L, 0L, 1L, 0L, 1L, 0L, 2L, 0L)), row.names = c(NA, -9L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 0

Related Questions