fokos
fokos

Reputation: 653

How can I remove a word before and after a particular word?

I have a huge text file with names in which I want to remove the full name of everyone with a particular middle name. For example let's say I want to remove everyone with a middle name Tom

Row 1: John Andrew Smith, Tobias Tom Jones, Anton Morvol Hert, Andy Tom Smith, ...

Row 2: Wade Tom Jobs, Randal Robert Rodes, ...

Thanks

Upvotes: 1

Views: 51

Answers (2)

akrun
akrun

Reputation: 887561

We can do this in base R

grep("\\s\\bTom\\b\\s", unlist(strsplit(df$V1, ", ")), 
       invert = TRUE, value = TRUE)
#[1] "John Andrew Smith" "Anton Morvol Tom"  "Tom Robert Rodes" 

data

df <- structure(list(V1 = c("John Andrew Smith, Tobias Tom Jones,
      Anton Morvol Tom, Andy Tom Smith", 
"Wade Tom Jobs, Tom Robert Rodes")), 
        row.names = c(NA, -2L), class = "data.frame")

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389175

You can read the text file into R, split comma separated values into separate_rows and then remove those rows which have "Tom" as middle name. I would suggest to keep data where each entry is in different row.

library(dplyr)

df %>%
  tidyr::separate_rows(V1, sep = ", ") %>%
  filter(!grepl("\\w\\s*Tom\\s*\\w", V1))

#                V1
#1 John Andrew Smith
#2  Anton Morvol Tom
#3  Tom Robert Rodes

If you want the same structure back

df %>%
  mutate(row = row_number()) %>%
  tidyr::separate_rows(V1, sep = ", ") %>%
  filter(!grepl("\\w\\s*Tom\\s*\\w", V1)) %>%
  group_by(row) %>%
  summarise(V1 = toString(V1))

data

Changed the input a bit for testing purposes.

text = "John Andrew Smith, Tobias Tom Jones, Anton Morvol Tom, Andy Tom Smith
         Wade Tom Jobs, Tom Robert Rodes"
df <- read.table(text = text, sep = "|", strip.white = TRUE)

Upvotes: 2

Related Questions