Sepideh Shamsizadeh
Sepideh Shamsizadeh

Reputation: 71

A regex to remove all words which contains number in R

I want to write a regex in R to remove all words of a string containing numbers.

For example:

first_text = "a2c if3 clean 001mn10 string asw21"
second_text = "clean string

Upvotes: 3

Views: 3691

Answers (4)

S.C
S.C

Reputation: 740

It is easier to select words with no numbers than to select and delete words with numbers:

> library(stringr)
> str1 <- "a2c if3 clean 001mn10 string asw21"
> paste(unlist(str_extract_all(str1, "(\\b[^\\s\\d]+\\b)")), collapse = " ")
[1] "clean string"

Note:

  • Backslashes have to be escaped in R to work properly, hence double backslashes
  • \b is word boundary
  • \s is white space
  • \d is digit character
  • a caret (^) inside square brackets is a negater: find characters that do not match ...
  • "+" after the character group inside [] means "1 or more" occurrences of those (non white space and non digit) characters

Upvotes: 4

s_baldur
s_baldur

Reputation: 33488

A bit longer than some of the answers but very tractable is to first convert the string to a vector of words, then check word by word if there are any numbers and use standard R subsetting.

first_text_vec <- strsplit(first_text, " ")[[1]]
first_text_vec
[1] "a2c"     "if3"     "clean"   "001mn10" "string"  "asw21"  
paste(first_text_vec[!grepl("[0-9]", first_text_vec)], collapse = " ")
[1] "clean string"

Upvotes: 2

Santosh M.
Santosh M.

Reputation: 2454

Just another alternative using gsub

trimws(gsub("[^\\s]*[0-9][^\\s]*", "", first_text, perl=T))
#[1] "clean  string"

Upvotes: 3

akrun
akrun

Reputation: 887128

Try with gsub

trimws(gsub("\\w*[0-9]+\\w*\\s*", "", first_text))
#[1] "clean string"

Upvotes: 10

Related Questions