89_Simple
89_Simple

Reputation: 3815

find alphanumeric elements in vector

I have a vector

    myVec <- c('1.2','asd','gkd','232','4343','1.3zyz','fva','3213','1232','dasd')

In this vector, I want to do two things:

  1. Remove any numbers from an element that contains both numbers and letters and then
  2. If a group of letters is followed by another group of letters, merge them into one.

So the above vector will look like this:

'1.2','asdgkd','232','4343','zyzfva','3213','1232','dasd'

I thought I will first find the alphanumeric elements and remove the numbers from them using gsub. I tried this

    gsub('[0-9]+', '', myVec[grepl("[A-Za-z]+$", myVec, perl = T)])

    "asd"  "gkd"  ".zyz" "fva"  "dasd"

i.e. it retains the . which I don't want.

Upvotes: 2

Views: 246

Answers (2)

MarkusN
MarkusN

Reputation: 3223

Here's my regex-only solution:

myVec <- c('1.2','asd','gkd','232','4343','1.3zyz','fva','3213','1232','dasd')

# find all elemnts containing letters
lettrs = grepl("[A-Za-z]", myVec)

# remove all non-letter characters
myVec[lettrs] = gsub("[^A-Za-z]" ,"", myVec[lettrs])

# paste all elements together, remove delimiter where delimiter is surrounded by letters and split string to new vector
unlist(strsplit(gsub("(?<=[A-Za-z])\\|(?=[A-Za-z])", "", paste(myVec, collapse="|"), perl=TRUE), split="\\|"))

Upvotes: 1

MrFlick
MrFlick

Reputation: 206606

This seems to return what you are after

myVec <- c('1.2','asd','gkd','232','4343','1.3zyz','fva','3213','1232','dasd')


clean <- function (x) {
  is_char <- grepl("[[:alpha:]]", x)
  has_number <- grepl("\\d", x)
  mixed <- is_char & has_number
  x[mixed] <- gsub("[\\d\\.]+","", x[mixed], perl=T)
  grp <- cumsum(!is_char | (is_char  & !c(FALSE, head(is_char, -1))))
  unname(tapply(x, grp, paste, collapse=""))
}

clean(myVec)
# [1] "1.2"    "asdgkd" "232"    "4343"   "zyzfva" "3213"   "1232"   "dasd" 

Here we look for numbers and letters mixed together and remove the numbers. Then we defined groups for collapsing, looking for characters that come after other characters to put them in the same group. Then we finally collapse all the values in the same group.

Upvotes: 5

Related Questions