Reputation: 4309
How can I show which special character was a match in each row of the single column dataframe?
Sample dataframe:
a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))
determining if the string has a special character:
a$name_cleansed <- gsub("([-./&,])|[[:punct:]]","\\1",a$name) #\\1 puts back the exception we define (dash and slash)
a <- a %>% mutate(has_special_char=if_else(name==name_cleansed,FALSE,TRUE))
Upvotes: 1
Views: 1186
Reputation: 5788
Base R regex solution using (caret) not "^" operator:
gsub("(^[-./&,])|[^[:punct:]]", "", a$name)
Also if you want a data.frame returned:
within(a, {
special_char <- gsub("(^[-./&,])|[^[:punct:]]", "", name);
has_special_char <- special_char != ""})
If you only want unique special characters per name as in @Ronak Shah's answer:
within(a, {
special_char <- sapply(gsub("(^[-./&,])|[^[:punct:]]", "", a$name),
function(x){toString(unique(unlist(strsplit(x, ""))))});
has_special_char <- special_char != ""
}
Upvotes: 0
Reputation: 389012
You can use str_extract
if we want only first special character.
library(stringr)
str_extract(a$name,'[[:punct:]]')
#[1] NA "'" "_" NA "%" "_" "@"
If we need all of the special characters we can use str_extract_all
.
sapply(str_extract_all(a$name,'[[:punct:]]'), function(x) toString(unique(x)))
#[1] "" "'" "_" "" "%" "_, !" "@, !"
To exclude certain symbols, we can use
exclude_symbol <- c('-', '.', '/', '&', ',')
sapply(str_extract_all(a$name,'[[:punct:]]'), function(x)
toString(setdiff(unique(x), exclude_symbol)))
Upvotes: 1
Reputation: 521457
We can use grepl
here for a base R option:
a$has_special_char <- grepl("(?![-./&,])[[:punct:]]", a$name, perl=TRUE)
a$special_char <- ifelse(a$has_special_char, sub("^.*([[:punct:]]).*$", "\\1", a$name), NA)
a
name has_special_char special_char
1 foo FALSE <NA>
2 bar' TRUE '
3 ip_sum TRUE _
4 four FALSE <NA>
5 %23 TRUE %
6 2_planet! TRUE !
7 @abc!! TRUE !
Data:
a <- data.frame(name=c("foo","bar'","ip_sum","four","%23","2_planet!","@abc!!"))
The above logic returns, arbitrarily, the first symbol character, if present, in each name, otherwise returning NA
. It reuses the has_special_char
column to determine if a symbol occurs in the name already.
Edit:
If you want a column which shows all special characters, then use:
a$all_special_char <- ifelse(a$has_special_char, gsub("[^[:punct:]]+", "", a$name), NA)
Upvotes: 1