Reputation: 491
My question is similar to this one, but the filter criteria is different.
> demo(dadmom,package="tidyr")
> library(tidyr)
> library(dplyr)
> dadmom <- foreign::read.dta("http://www.ats.ucla.edu/stat/stata/modules/dadmomw.dta")
> dadmom %>%
+ gather(key, value, named:incm) %>%
+ separate(key, c("variable", "type"), -2) %>%
+ spread(variable, value, convert = TRUE)
famid type inc name
1 1 d 30000 Bill
2 1 m 15000 Bess
3 2 d 22000 Art
4 2 m 18000 Amy
5 3 d 25000 Paul
6 3 m 50000 Pat
It is easy to pick out the family with mom's income >20000 using "incm" from the original table:
> dadmom
famid named incd namem incm
1 1 Bill 30000 Bess 15000
2 2 Art 22000 Amy 18000
3 3 Paul 25000 Pat 50000
The question is: how do you do it from the "tidied" data?
Upvotes: 4
Views: 6785
Reputation: 887951
You could add group_by
and filter
to the codes
#OP's code
d1 <- dadmom %>%
gather(key, value, named:incm) %>%
separate(key, c("variable", "type"), -2) %>%
spread(variable, value, convert = TRUE)
d1 %>%
group_by(famid) %>%
filter(all(sum(type=='m' & inc > 15000)==sum(type=='m')))
# famid type inc name
# 1 2 d 22000 Art
# 2 2 m 18000 Amy
# 3 3 d 25000 Paul
# 4 3 m 50000 Pat
NOTE: The above will also work when there are multiple 'm's per famid (a bit more general)
For normal cases of single 'm/f' pair per famid
d1 %>%
group_by(famid) %>%
filter(any(inc >15000 & type=='m'))
# famid type inc name
#1 2 d 22000 Art
#2 2 m 18000 Amy
#3 3 d 25000 Paul
#4 3 m 50000 Pat
Also, if you wish to use data.table
, melt
from the devel version i.e. v1.9.5
can take multiple value columns. It can be installed from here
library(data.table)
melt(setDT(dadmom), measure.vars=list(c(2,4), c(3,5)),
variable.name='type', value.name=c('name', 'inc'))[,
type:=c('d', 'm')[type]][, .SD[any(type=='m' & inc >15000)] ,famid]
# famid type name inc
#1: 2 d Art 22000
#2: 2 m Amy 18000
#3: 3 d Paul 25000
#4: 3 m Pat 50000
Upvotes: 5