foc
foc

Reputation: 967

From tags to columns text

In a text which have formating labels such as

data.frame(id = c(1, 2), text = c("something here <h1>my text</h1> also <h1>Keep it</h1>", "<h1>title</h1> another here"))

#   id                                                  text
# 1  1 something here <h1>my text</h1> also <h1>Keep it</h1>
# 2  2                           <h1>title</h1> another here

How can someone slipt the text into different columns depending when the next <h1> </h1> start and ends. Example of output:

data.frame(id = c(1, 2), my_text = c("also", 0), keep_it = c(0, 0), title = c(0, "another here"))

#   id my_text keep_it        title
# 1  1    also       0            0
# 2  2       0       0 another here

insert 0 instead of NA to text that not exist after or the specific column doesn't exist in a row of the input

Upvotes: 1

Views: 38

Answers (1)

Darren Tsai
Darren Tsai

Reputation: 35554

A tidyverse solution:

library(tidyverse)

map_dfr(df$text, ~ str_match_all(.x, "<h1>(.*?)</h1>([^<]*)")[[1]] %>%
    as.data.frame %>% select(-1) %>% deframe) %>%
  mutate(across(everything(), ~ str_squish(.x) %>%
    replace(is.na(.x) | .x == "", 0)))

# # A tibble: 2 x 3
#   `my text` `Keep it` title       
#   <chr>     <chr>     <chr>       
# 1 also      0         0           
# 2 0         0         another here

Upvotes: 2

Related Questions