Reputation: 3921
I have a data table (DatosMex
) in R
and would like to recode a column within it named industry
. The distinct categories for this variable are:
Agricultura,Ganaderia,Pesca,Caza Forestal
Asociaciones
Comercio
Construccion
Energia,Petroleo,Gas,Mineria
Gobierno
Industria
N/A
NULL
Servicios
I want to create a new variable, say gr_industry
, that groups some categories. For instance, my new variable must group the categories Agricultura,Ganaderia,Pesca,Caza Forestal
, Asociaciones
,Energia,Petroleo,Gas,Mineria
and Gobienro
and assign them the code 1.
How would you do this using the data.table
package syntax?
My approach was this:
#Create an id for each industry
DatosMex[,cod_industria:=as.numeric(DatosMex$industry)]
#Create a new data table
ind =data.table(cod_industria=c(1:10),gr_industry=c(1,1,2,3,1,1,4,6,6,5))
setkey(DatosMex,cod_industria)
setkey(ind,cod_industria)
DatosMex[ind]
So, as you can see, I had to create a new data table ind
and then do the inner join. My question is: is there another way of doing this using the data.table
way? I don't want to create a table each time I need to do something similar. Also, I'd like to avoid using if statements.
Upvotes: 1
Views: 231
Reputation: 263332
I'm guessing one does not need to set a key or create a new data.table. The [
function is generally very fast, especially in datatable-objects:
DatosMex[, gr_industry := c(1,1,2,3,1,1,4,6,6,5)[cod_industria] ]
If that grouping translation vector is large then you can refer to it by name, even if it is outside the data.table.
dta <- data.table(a=sample(1:10, 20, repl=TRUE))
g6<- c(1,1,2,3,1,1,4,6,6,5)
dta[ , ind := g6[a] ]
#-------------------
a ind
1: 8 6
2: 4 3
3: 10 5
4: 8 6
snipped output
Upvotes: 4
Reputation: 115392
From an code organization point of view, you need to define the recoding at some point, either
data.table
orHere is a switch function example
## a function that will `switch` based on the levels 1:10
## note that it is Vectorized (to avoid calling `sapply`
switch_industry <- Vectorize(function(i) { switch(i, 1,1,2,3,1,1,4,6,6,5)})
DatosMex[, gr_industry := switch_industry(cod_industria)]
I would not call this a data.table
-specific solution.
Upvotes: 2