Reputation: 53
I have list of words, which i got from below code.
tags_vector <- unlist(tags_used)
Some of the strings in this list has ellipsis in the end,which i want to remove. Here i print the 5th element of this list, and its class
tags_vector[5]
#[1] "#b…"
class(tags_vector[5])
#[1] "character"
I am trying to remove the ellipsis from this 5th element using gsub
, using the code ,
gsub("[…]", "", tags_vector[5])
#[1] "#b…"
This code doesn't works and i get "#b…" as output. But in the same code when i put the value of 5th element directly, it works fine as below,
gsub("[…]", "", "#b…")
#[1] "#b"
I even tried putting the value of tags_vector[5]
in a variable x1
and tried to use it in gsub()
code but it still din't work.
Upvotes: 3
Views: 1353
Reputation: 2800
It might be a Unicode issue. In R(studio), not all characters are created equally.
I tried to create a reproducible example:
# create the ellipsis from the definition (similar to your tags_used)
> ell_def <- rawToChar(as.raw(c('0xE2','0x80','0xA6'))) # from the unicode definition here: http://www.fileformat.info/info/unicode/char/2026/index.htm
> Encoding(ell_def) <- 'UTF-8'
> ell_def
[1] "…"
> Encoding(ell_def)
[1] "UTF-8"
# create the ellipsis from text (similar to your string)
> ell_text <- '…'
> ell_text
[1] "…"
> Encoding(ell_text)
[1] "latin1"
# show that you can get strange results
> gsub(ell_text,'',ell_def)
[1] "…"
The reproducibility of this example might be dependent on your locale. In my case, I work in windows-1252 since you cannot set the locale to UTF-8 in Windows. According to this stringi source, "R lets strings in ASCII, UTF-8, and your platform's native encoding coexist peacefully". As the example above shows, this might sometimes give contradictory results.
Basically, the output you see looks the same, but isn't on a byte level.
If I run this example in the R terminal, I get similar results, but apparently, it shows the ellipsis as a dot: ".".
A quick fix for your example would be to use the ellipsis definition in your gsub. E.g.:
gsub(ell_def,'',tags_vector[5])
Upvotes: 2