Reputation: 43
Here is a snippet that could help a few 'R beginners' like me: I was referring to this thread for a need on my melted data table:
Replace entire string anywhere in dataframe based on partial match with dplyr
I was looking for an easy way of replacing an entire string in one of the columns in data table with a partial match string. I could not find a straight fit on the forum, hence this post.
dt<-data.table(x=c("A_1", "BB_2", "CC_3"),y=c("K_1", "LL_2", "MM_3"),z=c("P_1","QQ_2","RR_3")
> dt
x y z
1: A_1 K_1 P_1
2: BB_2 LL_2 QQ_2
3: CC_3 MM_3 RR_3
replace multiple values in col y
with multiple patterns to match:
dt[,2]<-str_replace_all(as.matrix(dt[,2]),c("K_.*" = "FORMULA","LL_.*" = "RACE","MM_.*" = "CAR"))
using as.matrix()
on column excludes the warning on input to the str_replace_all()
function.
The result is:
> dt[,2]<-str_replace_all(as.matrix(dt[,2]),c("K_.*" = "FORMULA","LL_.*" = "RACE","MM_.*" = "CAR"))
> dt
x y z
1: A_1 FORMULA P_1
2: BB_2 RACE QQ_2
3: CC_3 CAR RR_3
>
very un-elegant, but worked for me, when the column data is large, this seemed to be a quick solution.
Requires library(stringr)
.
Any suggestions to improve are appreciated.
Editing this post as I tried something as below:
dt<-data.table(x=c("A_1", "BB_2", "CC_3"),y=c("K_1", "LL_2", "MM_3"),z=c("P_1","QQ_2","RR_3"))
dt[, nu_col := c(1:3)]
molten.dt<-melt(dt,id.vars = "nu_col", measure.vars = c("x","y","z"))
molten.dt[, one_more := ifelse(grepl("A_.*", value), "HONDA","FERRARI")]
The error that I see on Rstudio's console is :
Error in `:=`(one_more, ifelse(grepl("A_.*", value), "HONDA", "FERRARI")) :
Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
Runs perfectly fine on R Terminal
> dt<-data.table(x=c("A_1", "BB_2", "CC_3"),y=c("K_1", "LL_2", "MM_3"),z=c("P_$
> dt[, nu_col := c(1:3)]
> molten.dt<-melt(dt,id.vars = "nu_col", measure.vars = c("x","y","z"))
> molten.dt
nu_col variable value
1: 1 x A_1
2: 2 x BB_2
3: 3 x CC_3
4: 1 y K_1
5: 2 y LL_2
6: 3 y MM_3
7: 1 z P_1
8: 2 z QQ_2
9: 3 z RR_3
> molten.dt[, one_more := ifelse(grepl("A_.*", value), "HONDA","FERRARI")]
> molten.dt
nu_col variable value one_more
1: 1 x A_1 HONDA
2: 2 x BB_2 FERRARI
3: 3 x CC_3 FERRARI
4: 1 y K_1 FERRARI
5: 2 y LL_2 FERRARI
6: 3 y MM_3 FERRARI
7: 1 z P_1 FERRARI
8: 2 z QQ_2 FERRARI
9: 3 z RR_3 FERRARI
>
Upvotes: 2
Views: 1464
Reputation: 11255
data.table has a different API for updating in place. While this would be dplyr:
tib <- tib %>% mutate(new_col = old_col + 2)
The same thing is done in place using the :=
operator:
dt[, new_col := old_col + 2]
So note, once we are inside the brackets, we can pass a vector along to other functions. To apply that to your example, we can do...
library(data.table)
library(stringr)
dt<-data.table(x=c("A_1", "BB_2", "CC_3"),y=c("K_1", "LL_2", "MM_3"),z=c("P_1","QQ_2","RR_3"))
dt[, y := str_replace_all(y,c("K_.*" = "FORMULA","LL_.*" = "RACE","MM_.*" = "CAR")) ]
dt
## x y z
## <char> <char> <char>
## 1: A_1 FORMULA P_1
## 2: BB_2 RACE QQ_2
## 3: CC_3 CAR RR_3
Note, since str_replace_all
expects a vector, you could have replaced as.matrix(dt[,2])
with dt[[2]]
. The difference is that dt[, 2]
produces a single-column data.table; as.matrix(dt[, 2])
produces a single column matrix, whereas dt[[2]]
produces a vector. I would still recommend using dt[, new := old + 2]
type of syntax.
Upvotes: 2