Tdebeus
Tdebeus

Reputation: 1599

Separate string into columns by extracting al groups that match regex

I have these strings in every row of one column.

example_df <- tibble(string = c("[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":19.843418733556412,\"volgorde\":10},{\"positieVergelekenMetSchooladvies\":\"op niveau\",\"percentage\":81.81818181818181,\"percentageVergelijking\":78.58821425834631,\"volgorde\":20},{\"positieVergelekenMetSchooladvies\":\"onder niveau\",\"percentage\":9.090909090909092,\"percentageVergelijking\":1.5683670080972694,\"volgorde\":30}]"))

I'm only interested in the numbers. This regex works:

example_df %>% 
  .$string %>% 
  str_extract_all(., "[0-9]+\\.[0-9]+")

Instead of using the separate() function I want to use the extract() function. My understanding is that it differs from separate() in that extract() matches your regex you want to populate your new columns with. separate() matches, of course, the separation string. But where separate() matches all strings you fill in at sep= extract() matches only one group.

example_df %>% 
  extract(string, 
           into = c("boven_niveau_school",
                    "boven_niveau_verg",
                    "op_niveau_school",
                    "op_niveau_verg",
                    "onder_niveau_school",
                    "onder_niveau_verg"),
           regex = "([0-9]+\\.[0-9]+)")

What am I doing wrong?

Upvotes: 0

Views: 78

Answers (2)

akrun
akrun

Reputation: 887118

We can use regmatches/regexpr from base R

out <- regmatches(example_df$string, gregexpr("\\d+\\.\\d+", example_df$string))[[1]]
example_df[paste0("new", seq_along(out))] <- as.list(out)
example_df
# A tibble: 1 x 7
#  string                                                                     new1        new2         new3        new4       new5       new6       
#  <chr>                                                                      <chr>       <chr>        <chr>       <chr>      <chr>      <chr>      
#1 "[{\"positieVergelekenMetSchooladvies\":\"boven niveau\",\"percentage\":9… 9.09090909… 19.84341873… 81.8181818… 78.588214… 9.0909090… 1.56836700…

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388982

Instead of separate or extract I would extract all the numbers from the string and then use unnest_wider to create new columns.

library(tidyverse)

example_df %>%
  mutate(temp = str_extract_all(string, "[0-9]+\\.[0-9]+")) %>%
  unnest_wider(temp)

You can rename the columns as per your choice.

Upvotes: 1

Related Questions