Reputation: 383
I have a dataframe as with special characters as below
Key Q1 Q2
22 aSk aÃ…Â k
23 aSk aÃ…Â k
24 aSk aÃ…Â k
I would like to replace the "Ã…Â k" (including the space between k) in Q2 with "aSk" to have result as below (same as Q1)
Key Q1 Q2
22 aSk aSk
23 aSk aSk
24 aSk aSk
I have tried to use gsub function in R
df$Q2 <- gsub("[Ã…Â]", "S", df$Q2)
but I'm unable to remove the "space" and get the result as below instead
Key Q1 Q2
22 aSk aSSS k
23 aSk aSSS k
24 aSk aSSS k
Can I know what's wrong with my code and how to remove the "space" and "SSS" in R?
(The actual word in my raw file in csv is "aÅ k". However, it appears as "aÃ…Â k" in R)
Thanks.
Upvotes: 1
Views: 6416
Reputation: 886948
We can match one or more characters that are not alpbabets and replace it with "S"
df$Q2 <- sub("[^A-Za-z]+", "S", df$Q2)
df$Q2
#[1] "aSk" "aSk" "aSk"
Or we capture only the alphabetic characters as a group (([A-Za-z]*
) from the start (*
) of the string, match the following characters that are non-alphabets and replace with the backreference of the captured group followed by "S"
sub("^([A-Za-z]*)[^A-Za-z]+", "\\1S", df$Q2)
#[1] "aSk" "aSk" "aSk"
Upvotes: 1