jinlong
jinlong

Reputation: 839

How to insert the missing line breakers in R dataframe

I need to insert some missing line breakers in an one-column R dataframe. Those line breakers were missing from the data collection phase.

The data looks like:

V1
Apple
OrangeBanana
BananaBananaBanana
Watermelon
GrapeBanana

so all the line breakers before "Banana" are missing

I want to search for "Banana" and add those missing line breakers so it looks like:

V1
Apple
Orange
Banana
Banana
Banana
Banana
Watermelon
Grape
Banana

Upvotes: 1

Views: 73

Answers (1)

sebastian-c
sebastian-c

Reputation: 15395

Here's a slightly more general solution, but one that can be easily purposed to explicitly working with "Banana".

V1 <- c("Apple", "OrangeBanana", "BananaBananaBanana", "Watermelon", "GrapeBanana")

First, let's split them up by finding all upper case letter which aren't word boundaries and replacing them with a space and an upper case letter:

splits <- gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE)
[1] "Apple" "Orange Banana" "Banana Banana Banana" "Watermelon" "Grape Banana"

Then split by the space character and convert from list to vector:

unlist(strsplit(splits, " "))
[1] "Apple" "Orange" "Banana" "Banana" "Banana" "Banana" "Watermelon" "Grape" "Banana"   

Or in one line:

unlist(strsplit(gsub("(?:\\B)([[:upper:]])"," \\1" , V1, perl=TRUE), " "))

EDIT: For a regex that works explicitly with "Banana":

gsub("(?:\\B)(Banana)"," \\1" , V1, perl=TRUE)

Upvotes: 3

Related Questions