Reputation: 839
I need to insert some missing line breakers in an one-column R dataframe. Those line breakers were missing from the data collection phase.
The data looks like:
V1
Apple
OrangeBanana
BananaBananaBanana
Watermelon
GrapeBanana
so all the line breakers before "Banana" are missing
I want to search for "Banana" and add those missing line breakers so it looks like:
V1
Apple
Orange
Banana
Banana
Banana
Banana
Watermelon
Grape
Banana
Upvotes: 1
Views: 73
Reputation: 15395
Here's a slightly more general solution, but one that can be easily purposed to explicitly working with "Banana".
V1 <- c("Apple", "OrangeBanana", "BananaBananaBanana", "Watermelon", "GrapeBanana")
First, let's split them up by finding all upper case letter which aren't word boundaries and replacing them with a space and an upper case letter:
splits <- gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE)
[1] "Apple" "Orange Banana" "Banana Banana Banana" "Watermelon" "Grape Banana"
Then split by the space character and convert from list to vector:
unlist(strsplit(splits, " "))
[1] "Apple" "Orange" "Banana" "Banana" "Banana" "Banana" "Watermelon" "Grape" "Banana"
Or in one line:
unlist(strsplit(gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE), " "))
EDIT: For a regex that works explicitly with "Banana":
gsub("(?:\\B)(Banana)"," \\1" , V1, perl=TRUE)
Upvotes: 3