Reputation: 87
a case
df=data.frame(id=c(101,102,102,103,104,104,104),
calmonth=c('01','01','01','01','01','01','02'),
product=c('apple','apple','htc','htc','apple','htc','nokia'),
bb=sample(1:20,7))
> df
id calmonth product bb
1 101 01 apple 4
2 102 01 apple 9
3 102 01 htc 8
4 103 01 htc 5
5 104 01 apple 16
6 104 01 htc 19
7 104 02 nokia 20
sql statement: get the intersection where id uses both product "apple" and product "htc" when calmonth="01"
select id from df where calmonth='01' and product="apple" and id in
(select id from df where product="htc" and calmonth="01")
the predictive result
id calmonth product
1 102 01 apple & htc
2 104 01 apple & htc
So what's the responding R statement?
Upvotes: 3
Views: 1359
Reputation: 6522
Your SQL statement doesn't produce quite the result you show. It returns (selects) the IDs only not that table and the merged column? No?
SELECT id
FROM df
WHERE calmonth = '01'
AND product = "apple"
AND id IN (SELECT id
FROM df
WHERE product = "htc"
AND calmonth = "01")
in R that is roughly:
with(df,
intersect(
id[calmonth=='01' & product=='apple'],
id[product=="htc" & calmonth=="01"]))
[1] 102 104
Upvotes: 2
Reputation: 56149
If you prefer SQL syntax then use sqldf
package:
library(sqldf)
sqldf("
select *
from (
select id,
calmonth,
group_concat(product, ' & ') product
from df
group by id, calmonth
)
where product='apple & htc' and
calmonth='01'
")
Upvotes: 3