Reputation: 648
this an exemplary excerpt of my data set. It looks like as follows:
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id_234;2018/03/02
I want to delete those words which contain a a colon. In this case, this would be wa119:d, ax21:3 and bC230:13 so that my new data set should look like as follows:
Description;ID;Date
Here comes the first row;id_112;2018/03/02
Here comes the second row;id_115;2018/03/02
Here comes the third row;id_234;2018/03/02
Unfortunately, I was not able to find a regular expression / solution with gsub? Can anyone help?
Upvotes: 4
Views: 2185
Reputation: 109844
Here's one approach:
## reading in yor data
dat <- read.table(text ='
Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02
', sep = ';', header = TRUE, stringsAsFactors = FALSE)
## \\w+ = one or more word characters
gsub('\\w+:\\w+\\s+', '', dat$Description)
## [1] "Here comes the first row"
## [2] "Here comes the second row"
## [3] "Here comes the third row"
More info on \\w
a shorthand character class that is the same as [A-Za-z0-9_]
:https://www.regular-expressions.info/shorthand.html
Upvotes: 3
Reputation: 20085
Another solution that will exactly match expected result from OP could be as:
#data
df <- read.table(text = "Description;ID;Date
wa119:d Here comes the first row;id_112;2018/03/02
ax21:3 Here comes the second row;id_115;2018/03/02
bC230:13 Here comes the third row;id:234;2018/03/02", stringsAsFactors = FALSE, sep="\n")
gsub("[a-zA-Z0-9]+:[a-zA-Z0-9]+\\s", "", df$V1)
#[1] "Description;ID;Date"
#[2] "Here comes the first row;id_112;2018/03/02"
#[3] "Here comes the second row;id_115;2018/03/02"
#[4] "Here comes the third row;id:234;2018/03/02"
Upvotes: 0
Reputation: 2105
Supposing the column you want to modify is dat
:
dat <- c("wa119:d Here comes the first row",
"ax21:3 Here comes the second row",
"bC230:13 Here comes the third row")
Then you can take each element, split it into words, remove the words containing a colon, and then paste what's left back together, yielding what you want:
dat_colon_words_removed <- unlist(lapply(dat, function(string){
words <- strsplit(string, split=" ")[[1]]
words <- words[!grepl(":", words)]
paste(words, collapse=" ")
}))
Upvotes: 0