Reputation: 89
I am looking to create subsets of Contact.ID's occurring in certain Terms but not other.
To explain a bit better, this is a snapshot of my dataset:
df <- c(Contact.ID, Date.Time, Age, Gender, Attendance)
Contact.ID Date Time Age Gender Attendance Term
1 A 2012-10-06 18:54:48 37 Male 30 Term1
2 A 2013-03-12 20:50:18 37 Male 30 Term2
3 A 2013-05-24 20:18:44 37 Male 30 Term3
4 B 2012-11-15 16:58:15 27 Female 40 Term1
5 B 2012-12-23 10:57:02 27 Female 40 WB
6 B 2013-01-11 17:31:22 27 Female 40 Term2
7 B 2013-02-18 18:37:00 27 Female 40 Term2
8 C 2013-02-22 17:46:07 40 Male 5 Term2
9 C 2013-02-27 11:21:00 40 Male 5 Term2
10 D 2012-10-28 14:48:33 20 Female 12 Term1
My issue is, I need to create further segmentation dependent on Contact.ID's
So the groups I am looking to create are:
Term 1 Only --> ID's that are only present in term 1 but not in any other term (ID D for example)
Term 2 Only ---> ID's that are only present in term 2 but not in any other term (ID C for example)
Term 1 & 2 --> ID's that are only in term 1 and 2 not term 3(ID B for example)
Term 1 & 2 & 3 --> ID's that are present in all terms (ID A for example)
I have tried different ways of adding conditions to the subset
and also tried df[ which ()]
sort of function and subset(df, () & () & !())
but I cant seem to get it right.
Any suggestions? I sincerely appreciate the help.
Upvotes: 1
Views: 790
Reputation: 2806
So i dont know what the WB is in your data set, but I think you can follow this code to fill in what you're looking for. We basically just need to filter on the unique number of terms that each Contact.ID are in and then make sure the terms are correct. I am not counting "WB" as one of the terms since it doesn't look like you are.
library(data.table)
dat = read.table("clipboard", header = TRUE)
setDT(dat)
dat[ , 'Num_Unique_Terms' := uniqueN(Term[Term != "WB"]), by = Contact.ID]
term1 = dat[Num_Unique_Terms == 1 & Term == "Term1"]
term2 = dat[Num_Unique_Terms == 1 & Term == "Term2"]
terms12and3 = dat[Num_Unique_Terms == 3]
dat[ , 'All_1_or_2' := ifelse(all(Term[Term != "WB"] %in% c("Term1", "Term2")), 1, 0), by = Contact.ID]
dat[ , 'All_2_or_3' := ifelse(all(Term[Term != "WB"] %in% c("Term2", "Term3")), 1, 0), by = Contact.ID]
term1and2 = dat[All_1_or_2 == 1 & Num_Unique_Terms == 2]
term2and3 = dat[All_2_or_3 == 1 & Num_Unique_Terms == 2]
Upvotes: 1