BlackHat
BlackHat

Reputation: 755

Regular Expressions - Starts with, Contains, and Ends with

I have a string that contains several "\n". I would like to look at each line and remove every line that contains the word "banana"

Sample DF:

farm_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'), stringsAsFactors=FALSE)

What I've tried:

farm_data$sentence <- gsub(".* bananas .* \n", "\n", farm_data$sentence)

What I want:

clean_data <- data.frame(shop=c('fruit'),
                        sentence=c('the basket contains apples
                                  why not just boil the fruits'), stringsAsFactors=FALSE)

Lines that contain banana have been removed.

Thanks.

Upvotes: 1

Views: 160

Answers (2)

Tad Dallas
Tad Dallas

Reputation: 1189

I address the question in perhaps a roundabout way. I first split the query by the line break character \n.

sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))

After that I remove those elements of the resulting split that contain the word "banana".

cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep('banana',x)})==1))]

Then I hammer it back together using the paste function.

clean_data <- data.frame(shop=c('fruit'),
                        sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)

Hopefully this isn't too ham-fisted. :)

To address your concern about the usability to other "fruits" or strings:

cleanFruit <- function(fruit = 'banana'){
    sentence <- unlist(strsplit(as.character(farm_data$sentence), '\n'))
    cleanSentence <- sentence[-which(unlist(sapply(sentence, function(x){grep(fruit,x)})==1))]
    clean_data <- data.frame(shop=c('fruit'),
                            sentence= paste(cleanSentence, collapse=' \n'), stringsAsFactors=FALSE)
    return(clean_data)
}

Write it up into a function, and hand it a given fruit (or word). @rawr 's answer seems a bit cleaner.

Upvotes: 1

rawr
rawr

Reputation: 20811

x <- 'the basket contains apples
                                  bananas are the best
                                  are we going to eat bananas
                                  why not just boil the fruits
                                  let us make some banana smoothie'
cat(x)
# the basket contains apples
#                                   bananas are the best
#                                   are we going to eat bananas
#                                   why not just boil the fruits
#                                   let us make some banana smoothie

cat(gsub('.*banana.*\\n?', '', x, perl = TRUE))
# the basket contains apples
#                                   why not just boil the fruits

Upvotes: 3

Related Questions