Reputation: 653
I have a huge text file with names in which I want to remove the full name of everyone with a particular middle name. For example let's say I want to remove everyone with a middle name Tom
Row 1: John Andrew Smith, Tobias Tom Jones, Anton Morvol Hert, Andy Tom Smith, ...
Row 2: Wade Tom Jobs, Randal Robert Rodes, ...
Thanks
Upvotes: 1
Views: 51
Reputation: 887561
We can do this in base R
grep("\\s\\bTom\\b\\s", unlist(strsplit(df$V1, ", ")),
invert = TRUE, value = TRUE)
#[1] "John Andrew Smith" "Anton Morvol Tom" "Tom Robert Rodes"
df <- structure(list(V1 = c("John Andrew Smith, Tobias Tom Jones,
Anton Morvol Tom, Andy Tom Smith",
"Wade Tom Jobs, Tom Robert Rodes")),
row.names = c(NA, -2L), class = "data.frame")
Upvotes: 1
Reputation: 389175
You can read the text file into R, split comma separated values into separate_rows
and then remove those rows which have "Tom"
as middle name. I would suggest to keep data where each entry is in different row.
library(dplyr)
df %>%
tidyr::separate_rows(V1, sep = ", ") %>%
filter(!grepl("\\w\\s*Tom\\s*\\w", V1))
# V1
#1 John Andrew Smith
#2 Anton Morvol Tom
#3 Tom Robert Rodes
If you want the same structure back
df %>%
mutate(row = row_number()) %>%
tidyr::separate_rows(V1, sep = ", ") %>%
filter(!grepl("\\w\\s*Tom\\s*\\w", V1)) %>%
group_by(row) %>%
summarise(V1 = toString(V1))
data
Changed the input a bit for testing purposes.
text = "John Andrew Smith, Tobias Tom Jones, Anton Morvol Tom, Andy Tom Smith
Wade Tom Jobs, Tom Robert Rodes"
df <- read.table(text = text, sep = "|", strip.white = TRUE)
Upvotes: 2