Reputation: 967
In a text which have formating labels such as
data.frame(id = c(1, 2), text = c("something here <h1>my text</h1> also <h1>Keep it</h1>", "<h1>title</h1> another here"))
# id text
# 1 1 something here <h1>my text</h1> also <h1>Keep it</h1>
# 2 2 <h1>title</h1> another here
How can someone slipt the text into different columns depending when the next <h1> </h1>
start and ends. Example of output:
data.frame(id = c(1, 2), my_text = c("also", 0), keep_it = c(0, 0), title = c(0, "another here"))
# id my_text keep_it title
# 1 1 also 0 0
# 2 2 0 0 another here
insert 0 instead of NA to text that not exist after or the specific column doesn't exist in a row of the input
Upvotes: 1
Views: 38
Reputation: 35554
A tidyverse
solution:
library(tidyverse)
map_dfr(df$text, ~ str_match_all(.x, "<h1>(.*?)</h1>([^<]*)")[[1]] %>%
as.data.frame %>% select(-1) %>% deframe) %>%
mutate(across(everything(), ~ str_squish(.x) %>%
replace(is.na(.x) | .x == "", 0)))
# # A tibble: 2 x 3
# `my text` `Keep it` title
# <chr> <chr> <chr>
# 1 also 0 0
# 2 0 0 another here
Upvotes: 2