Reputation: 531

R (data.table): call different columns in a loop

I am trying to call different columns of a data.table inside a loop, to get unique values of each column.

Consider the simple data.table below.

> df <- data.table(var_a = rep(1:10, 2),
+                  var_b = 1:20)
> df
    var_a var_b
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10
11:     1    11
12:     2    12
13:     3    13
14:     4    14
15:     5    15
16:     6    16
17:     7    17
18:     8    18
19:     9    19
20:    10    20

My code works when I call for a specific column outside a loop,

> unique(df$var_a)
 [1]  1  2  3  4  5  6  7  8  9 10
> unique(df[, var_a])
 [1]  1  2  3  4  5  6  7  8  9 10
> unique(df[, "var_a"])
    var_a
 1:     1
 2:     2
 3:     3
 4:     4
 5:     5
 6:     6
 7:     7
 8:     8
 9:     9
10:    10

but not when I do so within a loop that goes through different columns of the data.table.

> for(v in c("var_a","var_b")){
+   print(v)
+   df$v
+   unique(df[, .v])
+   unique(df[, "v"])
+ }
[1] "var_a"
Error in `[.data.table`(df, , .v) : 
  j (the 2nd argument inside [...]) is a single symbol but column name '.v' is not found. Perhaps you intended DT[, ...v]. This difference to data.frame is deliberate and explained in FAQ 1.1.
> 
> unique(df[, ..var_a])
Error in `[.data.table`(df, , ..var_a) : 
  Variable 'var_a' is not found in calling scope. Looking in calling scope because you used the .. prefix.

Upvotes: 0

Answers (4)

langtang

Reputation: 24907

You may also be interested in the env param of data.table (see development version); here is an illustration below, but you could use this in a loop too.

v="var_a"
df[, v, env=list(v=v)]

Output:

 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10

Upvotes: 0

Hugh

Reputation: 16099

Use .subset2 to refer to a column by its name:

for(v in c("var_a","var_b")) {
  print(unique(.subset2(df, v)))
}

Upvotes: 1

r2evans

Reputation: 161110

For the first problem, when you're referencing a column name indirectly, you can either use double-dot ..v syntax, or add with=FALSE in the data.table::[ construct:

for (v in c("var_a", "var_b")) {
  print(v)
  print(df$v)
  ### either one of these will work:
  print(unique(df[, ..v]))
  # print(unique(df[, v, with = FALSE]))
}
# [1] "var_a"
# NULL
#     var_a
#     <int>
#  1:     1
#  2:     2
#  3:     3
#  4:     4
#  5:     5
#  6:     6
#  7:     7
#  8:     8
#  9:     9
# 10:    10
# [1] "var_b"
# NULL
#     var_b
#     <int>
#  1:     1
#  2:     2
#  3:     3
#  4:     4
#  5:     5
#  6:     6
#  7:     7
#  8:     8
#  9:     9
# 10:    10
# 11:    11
# 12:    12
# 13:    13
# 14:    14
# 15:    15
# 16:    16
# 17:    17
# 18:    18
# 19:    19
# 20:    20
#     var_b

But this just prints it without changing anything. If all you want to do is look at unique values within each column (and not change the underlying frame), then I'd likely go with

lapply(df[,.(var_a, var_b)], unique)
# $var_a
#  [1]  1  2  3  4  5  6  7  8  9 10
# $var_b
#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

which shows the name and unique values. The use of lapply (whether on df as a whole or a subset of columns) is also preferable to another recommendation to use apply(df, 2, unique), though in this case it returns the same results.

Upvotes: 1

DPH

Reputation: 4354

following the information on the first error, this would be the correct way to call in a loop:

for(v in c("var_a","var_b")){

    print(unique(df[, ..v]))

}
# won't print all the lines

as for the second error you have not declared a variable called "var_a", it looks like you want to select by name.

# works as you have shown
unique(df[, "var_a"])

# works once the variable is declared
var_a <- "var_a"
unique(df[, ..var_a])

Upvotes: 0

R (data.table): call different columns in a loop

Answers (4)

Related Questions