dmi3kno
dmi3kno

Reputation: 3045

Multiple regex match and assignment in data.table

I am trying to match a regex which outputs several values and assign it in-place to several new variables inside a data.table

library(data.table)
library(stringr)

fruit_regex <- "(\\d+): apples=(.*), oranges=(.*)"

DT <- data.table(V1=c("1: apples=0.1, oranges=0.01",
            "2: apples=0.2, oranges=0.02",
            "3: apples=0.3, oranges=0.03",
            "4: apples=0.4, oranges=0.04",
            "5: apples=0.5, oranges=0.05"))

DT[, c("txt","id","apples", "oranges"):= as.list(str_match_all(V1, fruit_regex))]

This, of course, fails and I am getting

>Warning messages:
>1: In `[.data.table`(DT, , `:=`(c("txt", "id", "apples", "oranges"),  :
>  Supplied 4 columns to be assigned a list (length 5) of values (1 unused)

str_match_all() says to be vectorized over patterns and strings, but for some reason I can not get it to work.

There's another known issue with my regex which returns a redundant full match and can be cured with lookaround assertions.

Desired result(looking away from redundant V1 and txt fields):

id apples oranges
1      0.1   0.01
2      0.2   0.02
3      0.3   0.03
4      0.4   0.04
5      0.5   0.05

Upvotes: 3

Views: 251

Answers (1)

Shahar Bental
Shahar Bental

Reputation: 1001

You need to transform your results into something that R can insert into the dataframe, such as another data frame. For example, solved using the "plyr" package

library(data.table)
library(stringr)
library(plyr)
fruit_regex <- "(\\d+): apples=(.*), oranges=(.*)"

DT <- data.table(V1=c("1: apples=0.1, oranges=0.01",
        "2: apples=0.2, oranges=0.02",
        "3: apples=0.3, oranges=0.03",
        "4: apples=0.4, oranges=0.04",
        "5: apples=0.5, oranges=0.05"))

DT[, c("txt","id","apples", "oranges"):= ldply(str_match_all(V1, fruit_regex))]

Upvotes: 3

Related Questions