Yorgos
Yorgos

Reputation: 30445

dplyr: how to avoid hard coding variable names when I need them all?

Here is a simple example. The variables are only three, but could be many more. I would like a replacement for every c(X1,X2,X3) but can't find one.

library(dplyr)
library(MASS)

df <- data.frame(expand.grid(data.frame(matrix(rep(1:7,3),ncol=3))))


df1 <- df %>%
  rowwise() %>%
  filter(length(unique(c(X1,X2,X3)))==3)


df1 %>%
  rowwise() %>%
  filter(max(c(X1,X2,X3))- min(c(X1,X2,X3)) == 2) %>%
  ungroup() %>%
  summarise(res = n()/ nrow(df1)) %>%
  unlist %>%
  as.fractions

Upvotes: 2

Views: 225

Answers (2)

alistaire
alistaire

Reputation: 43334

It really seems like everything() (newly fully exported) should do the trick, but it doesn't. Especially if you're going to be doing a lot of operations on all your columns, it may be worth it to make a list column with a vector of each row, on which you can easily call unique, max, etc. Here assembled with purrr, though you could do the same with apply(df, 1, list) %>% lapply(unlist):

library(purrr)

df1 <- df %>% 
    mutate(data = df %>% transpose() %>% map(unlist)) %>% 
    rowwise() %>% 
    filter(length(unique(data)) == 3)

df1
# Source: local data frame [210 x 4]
# Groups: <by row>
#   
#         X1    X2    X3      data
#      <int> <int> <int>    <list>
#   1      3     2     1 <int [3]>
#   2      4     2     1 <int [3]>
#   3      5     2     1 <int [3]>
#   4      6     2     1 <int [3]>
#   5      7     2     1 <int [3]>
#   6      2     3     1 <int [3]>
#   7      4     3     1 <int [3]>
#   8      5     3     1 <int [3]>
#   9      6     3     1 <int [3]>
#   10     7     3     1 <int [3]>
#   ..   ...   ...   ...       ...

df1 %>%
    rowwise() %>%
    filter(max(data) - min(data) == 2) %>%
    ungroup() %>%
    summarise(res = n() / nrow(df1)) %>%
    unlist %>%
    as.fractions()
# res 
# 1/7 

Upvotes: 2

akrun
akrun

Reputation: 886968

We can do this also with data.table

library(data.table)
res <- setDT(df)[df[ ,uniqueN(unlist(.SD))==3 , 1:nrow(df)]$V1][,
          sum(do.call(pmax, .SD)- do.call(pmin, .SD) ==2)/.N] 
as.fractions(res)
#[1] 1/7

If we need to use dplyr

library(dplyr)
df1 <- df %>%
         rowwise() %>% 
         do(data.frame(.,i1= n_distinct(unlist(.))==3)) %>% 
         filter(i1) %>% 
         dplyr::select(-i1)
df1 %>% 
    do(data.frame(., i2 = do.call(pmax, .) - do.call(pmin, .) == 2)) %>% 
    filter(i2) %>%
    ungroup() %>% 
    summarise(n = n()/nrow(df1)) %>%
    unlist %>%
    as.fractions
#  n 
#1/7 

Upvotes: 2

Related Questions