Reputation: 1576
I'm trying to split a vector of strings into two pieces (I only want to keep the first bit) based on the following criteria:
Example:
textvec <- c("this is an example", "I hope someone can help me", "Thank you in advance")
Expected result is a vector like this:
"this is an" , "I hope someone", "Thank you in"
What I tried so far: I'm able to get the full words that occur before or at the 12th character like this:
t13 <- substr(textvec , 1, 13) #gives me first 13 characters of each string
lastspace <- lapply(gregexpr(" ", t13), FUN=function(x) x[length(x)]) #gives me last space before/at 13th character
result <- substr(t13, start=1, stop=lastspace)
But what I want is to get include the word closest to the 12th character (e.g. "someone" in the example above), not necessarily before or at the 12th character. In case there's a tie, I would like to include the word after the 12th character. I hope I'm explaining myself clearly :)
Upvotes: 3
Views: 250
Reputation: 51582
Using cumsum
,
sapply(strsplit(textvec, ' '), function(i) paste(i[cumsum(nchar(i)) <= 12], collapse = ' '))
#[1] "this is an" "I hope someone" "Thank you in"
Upvotes: 3
Reputation: 887711
We can use gregexpr
to find the closest space at 12 and then with substr
cut the string
substr(textvec, 1, sapply(gregexpr("\\s+", textvec),
function(x) x[which.min(abs(12 - x))])-1)
#[1] "this is an" "I hope someone" "Thank you in"
Upvotes: 2