PaulaSpinola
PaulaSpinola

Reputation: 531

R (data.table): call different columns in a loop

I am trying to call different columns of a data.table inside a loop, to get unique values of each column.

Consider the simple data.table below.

> df <- data.table(var_a = rep(1:10, 2),
+                  var_b = 1:20)
> df
    var_a var_b
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10
11:     1    11
12:     2    12
13:     3    13
14:     4    14
15:     5    15
16:     6    16
17:     7    17
18:     8    18
19:     9    19
20:    10    20

My code works when I call for a specific column outside a loop,

> unique(df$var_a)
 [1]  1  2  3  4  5  6  7  8  9 10
> unique(df[, var_a])
 [1]  1  2  3  4  5  6  7  8  9 10
> unique(df[, "var_a"])
    var_a
 1:     1
 2:     2
 3:     3
 4:     4
 5:     5
 6:     6
 7:     7
 8:     8
 9:     9
10:    10

but not when I do so within a loop that goes through different columns of the data.table.

> for(v in c("var_a","var_b")){
+   print(v)
+   df$v
+   unique(df[, .v])
+   unique(df[, "v"])
+ }
[1] "var_a"
Error in `[.data.table`(df, , .v) : 
  j (the 2nd argument inside [...]) is a single symbol but column name '.v' is not found. Perhaps you intended DT[, ...v]. This difference to data.frame is deliberate and explained in FAQ 1.1.
> 
> unique(df[, ..var_a])
Error in `[.data.table`(df, , ..var_a) : 
  Variable 'var_a' is not found in calling scope. Looking in calling scope because you used the .. prefix.

Upvotes: 0

Views: 464

Answers (4)

langtang
langtang

Reputation: 24722

You may also be interested in the env param of data.table (see development version); here is an illustration below, but you could use this in a loop too.

v="var_a"
df[, v, env=list(v=v)]

Output:

 [1]  1  2  3  4  5  6  7  8  9 10  1  2  3  4  5  6  7  8  9 10

Upvotes: 0

Hugh
Hugh

Reputation: 16089

Use .subset2 to refer to a column by its name:

for(v in c("var_a","var_b")) {
  print(unique(.subset2(df, v)))
}

Upvotes: 1

r2evans
r2evans

Reputation: 160447

For the first problem, when you're referencing a column name indirectly, you can either use double-dot ..v syntax, or add with=FALSE in the data.table::[ construct:

for (v in c("var_a", "var_b")) {
  print(v)
  print(df$v)
  ### either one of these will work:
  print(unique(df[, ..v]))
  # print(unique(df[, v, with = FALSE]))
}
# [1] "var_a"
# NULL
#     var_a
#     <int>
#  1:     1
#  2:     2
#  3:     3
#  4:     4
#  5:     5
#  6:     6
#  7:     7
#  8:     8
#  9:     9
# 10:    10
# [1] "var_b"
# NULL
#     var_b
#     <int>
#  1:     1
#  2:     2
#  3:     3
#  4:     4
#  5:     5
#  6:     6
#  7:     7
#  8:     8
#  9:     9
# 10:    10
# 11:    11
# 12:    12
# 13:    13
# 14:    14
# 15:    15
# 16:    16
# 17:    17
# 18:    18
# 19:    19
# 20:    20
#     var_b

But this just prints it without changing anything. If all you want to do is look at unique values within each column (and not change the underlying frame), then I'd likely go with

lapply(df[,.(var_a, var_b)], unique)
# $var_a
#  [1]  1  2  3  4  5  6  7  8  9 10
# $var_b
#  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

which shows the name and unique values. The use of lapply (whether on df as a whole or a subset of columns) is also preferable to another recommendation to use apply(df, 2, unique), though in this case it returns the same results.

Upvotes: 1

DPH
DPH

Reputation: 4344

following the information on the first error, this would be the correct way to call in a loop:

for(v in c("var_a","var_b")){

    print(unique(df[, ..v]))

}
# won't print all the lines

as for the second error you have not declared a variable called "var_a", it looks like you want to select by name.

# works as you have shown
unique(df[, "var_a"])

# works once the variable is declared
var_a <- "var_a"
unique(df[, ..var_a])

Upvotes: 0

Related Questions