Foulball
Foulball

Reputation: 71

How to extract the text from certain places in the file?

Here is my data:

text <- **9 Mr.ABCD. Content1. Mrs. DEFG.Content2. **8 Mr.DBC something else. Content3.

How can I get data frame with as below:

9 Mr. ABCD. Content1 9 Mrs. DEFG. Content2 8 Mr. DBC. Content3

3 rows, 4 variables (number, Mr./Mrs., name, content)

The names in my data are always after Mr. or Mrs., and always in uppercases. There is alway a period before the content that I wanted.

Generally speaking I want to know who said what (with the number label)

Thanks!

Upvotes: 0

Views: 45

Answers (1)

akrun
akrun

Reputation: 887851

We may do

library(stringr)
library(tidyr)
library(dplyr)
tibble(col1 = text) %>% 
   separate_rows(col1, sep = "(?<=Content\\d\\.)\\s+") %>% 
   mutate(grp = readr::parse_number(col1)) %>%
   fill(grp) %>% 
   mutate(col1 = str_c(grp, str_remove(col1, "^[*]+\\d+\\s*"),
            sep=" "), grp = NULL) %>%
    pull(col1)

-output

[1] "9 Mr.ABCD. Content1."               "9 Mrs. DEFG.Content2."              "8 Mr.DBC something else. Content3."

data

text <- "**9 Mr.ABCD. Content1. Mrs. DEFG.Content2. **8 Mr.DBC something else. Content3."

Upvotes: 1

Related Questions