Fee
Fee

Reputation: 89

R: subset a dataframe by factor levels with a second condition

I am looking to create subsets of Contact.ID's occurring in certain Terms but not other.

To explain a bit better, this is a snapshot of my dataset:

df <- c(Contact.ID, Date.Time, Age, Gender, Attendance)

Contact.ID       Date    Time    Age   Gender   Attendance   Term 
1   A       2012-10-06 18:54:48   37    Male         30      Term1
2   A       2013-03-12 20:50:18   37    Male         30      Term2
3   A       2013-05-24 20:18:44   37    Male         30      Term3
4   B       2012-11-15 16:58:15   27  Female         40      Term1 
5   B       2012-12-23 10:57:02   27  Female         40       WB
6   B       2013-01-11 17:31:22   27  Female         40      Term2
7   B       2013-02-18 18:37:00   27  Female         40      Term2
8   C       2013-02-22 17:46:07   40    Male         5       Term2
9   C       2013-02-27 11:21:00   40    Male         5       Term2
10  D       2012-10-28 14:48:33   20  Female         12      Term1

My issue is, I need to create further segmentation dependent on Contact.ID's

So the groups I am looking to create are:

I have tried different ways of adding conditions to the subset and also tried df[ which ()] sort of function and subset(df, () & () & !()) but I cant seem to get it right.

Any suggestions? I sincerely appreciate the help.

Upvotes: 1

Views: 790

Answers (1)

Kristofersen
Kristofersen

Reputation: 2806

So i dont know what the WB is in your data set, but I think you can follow this code to fill in what you're looking for. We basically just need to filter on the unique number of terms that each Contact.ID are in and then make sure the terms are correct. I am not counting "WB" as one of the terms since it doesn't look like you are.

library(data.table)

dat = read.table("clipboard", header = TRUE)
setDT(dat)

dat[ , 'Num_Unique_Terms' := uniqueN(Term[Term != "WB"]), by = Contact.ID]

term1 = dat[Num_Unique_Terms == 1 & Term == "Term1"]
term2 = dat[Num_Unique_Terms == 1 & Term == "Term2"]
terms12and3 = dat[Num_Unique_Terms == 3]


dat[ , 'All_1_or_2' := ifelse(all(Term[Term != "WB"] %in% c("Term1", "Term2")), 1, 0), by = Contact.ID]
dat[ , 'All_2_or_3' := ifelse(all(Term[Term != "WB"] %in% c("Term2", "Term3")), 1, 0), by = Contact.ID]

term1and2 = dat[All_1_or_2 == 1 & Num_Unique_Terms == 2]
term2and3 = dat[All_2_or_3 == 1 & Num_Unique_Terms == 2]

Upvotes: 1

Related Questions