boshek
boshek

Reputation: 4416

Performing operation on multiple columns with data.table

I having trouble figuring out how to perform multiple operations in data.table using some patterns matching to determine which columns are used. For example:

library(data.table)
library(dplyr)
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

iris <- copy(iris)

iris_dplyr_above_6 <- iris %>% 
  select(contains("Length"), Species) %>% 
  gather(col, val, -Species) %>% 
  filter(val > 6) 

unique(iris_dplyr_above_6$Species)
#> [1] versicolor virginica 
#> Levels: setosa versicolor virginica


setDT(iris)
iris_dt_above_6 <- iris[Sepal.Length > 6 | Petal.Length > 6,]
unique(iris_dt_above_6$Species)
#> [1] versicolor virginica 
#> Levels: setosa versicolor virginica

Created on 2019-07-19 by the reprex package (v0.3.0)

In this example I can select columns with dplyr based on the "Length" string. In data.table I have to manually enter each column. Obviously this example is trivial as typing out two column names is hardly onerous. However, in situations where you have many many columns, having some programmatic way to select your columns is useful. I am assuming that data.table has a nifty way of doing this and I just haven't been able to find it yet. Or maybe I am misunderstanding the problem and really it is a base R solution.

Any advice?

Upvotes: 1

Views: 256

Answers (1)

Frank
Frank

Reputation: 66819

You can do

melt(iris, id="Species", measure=patterns("Length"))[value > 6, unique(Species)]

How it works. I'm not very familiar with tidyr, but... melt corresponds to gather and the measure.vars arg allows selection of columns, possible multiple groups of them, like, patterns("Length", "Width").

Context. The melt syntax is inherited from the reshape2 package, originally developed by the same people as dplyr, tidyr, et al. gather is soon to be replaced with pivot_longer in case you're interested in continuing with it. It sounds like it will eventually also have the patterns() functionality.

Upvotes: 3

Related Questions