jasmine_007
jasmine_007

Reputation: 87

What's the R statement responding to SQL's 'in' statement?

a case

df=data.frame(id=c(101,102,102,103,104,104,104),
         calmonth=c('01','01','01','01','01','01','02'),
         product=c('apple','apple','htc','htc','apple','htc','nokia'),
         bb=sample(1:20,7))

> df
   id calmonth product bb
1 101       01   apple  4
2 102       01   apple  9
3 102       01     htc  8
4 103       01     htc  5
5 104       01   apple 16
6 104       01     htc 19
7 104       02   nokia 20

sql statement: get the intersection where id uses both product "apple" and product "htc" when calmonth="01"

select id from df where calmonth='01' and product="apple"  and id in 
(select id from df where product="htc" and calmonth="01")

the predictive result

   id calmonth     product
1 102       01 apple & htc
2 104       01 apple & htc

So what's the responding R statement?

Upvotes: 3

Views: 1359

Answers (2)

Stephen Henderson
Stephen Henderson

Reputation: 6522

Your SQL statement doesn't produce quite the result you show. It returns (selects) the IDs only not that table and the merged column? No?

SELECT id 
FROM   df 
WHERE  calmonth = '01' 
       AND product = "apple" 
       AND id IN (SELECT id 
                  FROM   df 
                  WHERE  product = "htc" 
                         AND calmonth = "01") 

in R that is roughly:

with(df, 
intersect(
id[calmonth=='01' & product=='apple'], 
id[product=="htc" & calmonth=="01"]))

[1] 102 104

Upvotes: 2

zx8754
zx8754

Reputation: 56149

If you prefer SQL syntax then use sqldf package:

library(sqldf)
sqldf("
      select * 
      from (
            select id,
                   calmonth,
                   group_concat(product, ' & ') product
            from df
            group by id, calmonth
           ) 
      where product='apple & htc' and
            calmonth='01'
      ")

Upvotes: 3

Related Questions