Reputation: 71
Here is my data:
text <- **9 Mr.ABCD. Content1. Mrs. DEFG.Content2. **8 Mr.DBC something else. Content3.
How can I get data frame with as below:
9 Mr. ABCD. Content1 9 Mrs. DEFG. Content2 8 Mr. DBC. Content3
3 rows, 4 variables (number, Mr./Mrs., name, content)
The names in my data are always after Mr. or Mrs., and always in uppercases. There is alway a period before the content that I wanted.
Generally speaking I want to know who said what (with the number label)
Thanks!
Upvotes: 0
Views: 45
Reputation: 887851
We may do
library(stringr)
library(tidyr)
library(dplyr)
tibble(col1 = text) %>%
separate_rows(col1, sep = "(?<=Content\\d\\.)\\s+") %>%
mutate(grp = readr::parse_number(col1)) %>%
fill(grp) %>%
mutate(col1 = str_c(grp, str_remove(col1, "^[*]+\\d+\\s*"),
sep=" "), grp = NULL) %>%
pull(col1)
-output
[1] "9 Mr.ABCD. Content1." "9 Mrs. DEFG.Content2." "8 Mr.DBC something else. Content3."
text <- "**9 Mr.ABCD. Content1. Mrs. DEFG.Content2. **8 Mr.DBC something else. Content3."
Upvotes: 1