Ibo
Ibo

Reputation: 4309

R: show matched special character in a string

How can I show which special character was a match in each row of the single column dataframe?

Sample dataframe:

a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))

determining if the string has a special character:

a$name_cleansed <- gsub("([-./&,])|[[:punct:]]","\\1",a$name) #\\1 puts back the exception we define (dash and slash)

a <- a %>% mutate(has_special_char=if_else(name==name_cleansed,FALSE,TRUE))

enter image description here

Upvotes: 1

Views: 1186

Answers (3)

hello_friend
hello_friend

Reputation: 5788

Base R regex solution using (caret) not "^" operator:

gsub("(^[-./&,])|[^[:punct:]]", "", a$name)

Also if you want a data.frame returned:

within(a, {
  special_char <- gsub("(^[-./&,])|[^[:punct:]]", "", name); 
  has_special_char <- special_char != ""})

If you only want unique special characters per name as in @Ronak Shah's answer:

within(a, {
    special_char <- sapply(gsub("(^[-./&,])|[^[:punct:]]", "", a$name),
                           function(x){toString(unique(unlist(strsplit(x, ""))))});
    has_special_char <- special_char != ""
  }

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389012

You can use str_extract if we want only first special character.

library(stringr)
str_extract(a$name,'[[:punct:]]')
#[1] NA  "'" "_" NA  "%" "_" "@"

If we need all of the special characters we can use str_extract_all.

sapply(str_extract_all(a$name,'[[:punct:]]'), function(x) toString(unique(x)))
#[1] ""     "'"    "_"    ""     "%"    "_, !" "@, !"

To exclude certain symbols, we can use

exclude_symbol <- c('-', '.', '/', '&', ',')

sapply(str_extract_all(a$name,'[[:punct:]]'), function(x) 
                       toString(setdiff(unique(x), exclude_symbol)))

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521457

We can use grepl here for a base R option:

a$has_special_char <- grepl("(?![-./&,])[[:punct:]]", a$name, perl=TRUE)
a$special_char <- ifelse(a$has_special_char, sub("^.*([[:punct:]]).*$", "\\1", a$name), NA)
a

       name has_special_char special_char
1       foo            FALSE         <NA>
2      bar'             TRUE            '
3    ip_sum             TRUE            _
4      four            FALSE         <NA>
5       %23             TRUE            %
6 2_planet!             TRUE            !
7    @abc!!             TRUE            !

Data:

a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))

The above logic returns, arbitrarily, the first symbol character, if present, in each name, otherwise returning NA. It reuses the has_special_char column to determine if a symbol occurs in the name already.

Edit:

If you want a column which shows all special characters, then use:

a$all_special_char <- ifelse(a$has_special_char, gsub("[^[:punct:]]+", "", a$name), NA)

Upvotes: 1

Related Questions