Chris Ruehlemann
Chris Ruehlemann

Reputation: 21440

Conditionally paste strings together

The data I have is a vector with sentences cut into pieces.

y <- c("G'day", "world and everybody", "else.", "How's life?", "Hope", "you're", "doing just", "fine.")

I'd like to put the sentences back together.

Expected result:

y
[1] "G'day world and everybody else."
[2] "How's life?"
[3] "Hope you're doing just fine."

The 'rule' for there to be a sentence is that it starts with an upper-case letter. Building on this rule, what I've tried so far is this (but the result is anything but satisfactory):

unlist(strsplit(paste0(y[which(grepl("^[A-Z]", y))], " ", y[which(grepl("^[a-z]", y))], collapse = ","), ","))
[1] "G'day world and everybody" "How's life? else."         "Hope you're"               "G'day doing just"         
[5] "How's life? fine."

EDIT:

Have come up with this solution, which does give the expected result but looks ugly:

y1 <-  c(paste0(y[grepl("^[A-Z].*[^.?]$", y, perl = T)], " ", unlist(strsplit(paste0(y[which(grepl("^[a-z]", y))], collapse = " "), "\\."))), y[grepl("^[A-Z].*[.?]$", y, perl = T)])

y1
[1] "G'day world and everybody else" "Hope  you're doing just fine"   "How's life?"

What better solution is there?

EDIT 2:

Also a good solution is this:

library(stringr)
str_extract_all(paste(y, collapse = " "), "[A-Z][^.?]*(\\.|\\?)")

Upvotes: 1

Views: 42

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174586

I would use a gsub to insert a new line before each capital, then split at new lines:

unlist(strsplit(gsub(" ([A-Z])", "\n\\1", paste(y, collapse = " ")), "\n"))
#> [1] "G'day world and everybody else." "How's life?"                    
#> [3] "Hope you're doing just fine."

Created on 2020-05-28 by the reprex package (v0.3.0)

Upvotes: 2

Related Questions