Separate string into columns by extracting al groups that match regex

Question

I have these strings in every row of one column.

example_df <- tibble(string = c("[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":19.843418733556412,\"volgorde\":10},{\"positieVergelekenMetSchooladvies\":\"op niveau\",\"percentage\":81.81818181818181,\"percentageVergelijking\":78.58821425834631,\"volgorde\":20},{\"positieVergelekenMetSchooladvies\":\"onder niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":1.5683670080972694,\"volgorde\":30}]"))

I'm only interested in the numbers. This regex works:

example_df %>% 
  .$string %>% 
  str_extract_all(., "[0-9]+\.[0-9]+")

Instead of using the separate() function I want to use the extract() function. My understanding is that it differs from separate() in that extract() matches your regex you want to populate your new columns with. separate() matches, of course, the separation string. But where separate() matches all strings you fill in at sep= extract() matches only one group.

example_df %>% 
  extract(string, 
           into = c("boven_niveau_school",
                    "boven_niveau_verg",
                    "op_niveau_school",
                    "op_niveau_verg",
                    "onder_niveau_school",
                    "onder_niveau_verg"),
           regex = "([0-9]+\.[0-9]+)")

What am I doing wrong?

Ronak Shah · Accepted Answer

Instead of separate or extract I would extract all the numbers from the string and then use unnest_wider to create new columns.

library(tidyverse)

example_df %>%
  mutate(temp = str_extract_all(string, "[0-9]+\.[0-9]+")) %>%
  unnest_wider(temp)

You can rename the columns as per your choice.

Separate string into columns by extracting al groups that match regex

Answers (2)

Related Questions