Reputation: 209
I have some data that looks like this:
Course_ID Text_ID
33 17
33 17
58 17
5 22
8 22
42 25
42 25
17 26
17 26
35 39
51 39
Not having a background in programming, I'm finding it tricky to articulate my question, but here goes: I only want to keep rows where Course_ID
varies but where Text_ID
is the same. So for example, the final data would look something like this:
Course_ID Text_ID
5 22
8 22
35 39
51 39
As you can see, Text_ID
22 and 39 are the only ones that have different Course_ID
values. I suspect subsetting the data would be the way to go, but as I said, I'm quite a novice at this kind of thing and would really appreciate any advice on how to approach this.
Upvotes: 0
Views: 254
Reputation: 39737
You can use ave
testing if not anyDuplicated
.
x[ave(x$Course_ID, x$Text_ID, FUN=anyDuplicated)==0,]
# Course_ID Text_ID
#4 5 22
#5 8 22
#10 35 39
#11 51 39
Data:
x <- read.table(header=TRUE, text="Course_ID Text_ID
33 17
33 17
58 17
5 22
8 22
42 25
42 25
17 26
17 26
35 39
51 39")
Upvotes: 1
Reputation: 389325
Select those groups where there is no repeats of Course_ID
.
In dplyr
you can write this as -
library(dplyr)
df %>% group_by(Text_ID) %>% filter(n_distinct(Course_ID) == n()) %>% ungroup
# Course_ID Text_ID
# <int> <int>
#1 5 22
#2 8 22
#3 35 39
#4 51 39
and in data.table
-
library(data.table)
setDT(df)[, .SD[uniqueN(Course_ID) == .N], Text_ID]
Upvotes: 3
Reputation: 2636
Here is my approach with rlist
and dplyr
:
library(dplyr)
your_data %>%
split(~ Text_ID) %>%
rlist::list.filter(length(unique(Course_ID)) == length(Course_ID)) %>%
bind_rows()
Returns:
# A tibble: 4 x 2
Course_ID Text_ID
<dbl> <dbl>
1 5 22
2 8 22
3 35 39
4 51 39
# Data used:
your_data <- structure(list(Course_ID = c(33, 33, 58, 5, 8, 42, 42, 17, 17, 35, 51), Text_ID = c(17, 17, 17, 22, 22, 25, 25, 26, 26, 39, 39)), row.names = c(NA, -11L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 0