Chirayu Chamoli
Chirayu Chamoli

Reputation: 2076

Rupee Unicode replacement not working in R

I'm trying to replace currency symbols in my corpus to text such as $ to dollar. For example:

x <- "i have \u20AC and \u0024 and \u00A3 and \u00A5 and \u20B9"
"i have € and $ and £ and ¥ and \u20b9"

Unicode works well for all the currency except the rupee. So what would be the problem?

My second issue is while doing a gsub, Unicode replacement works for every symbol except for dollar.

sub('\u0024'dollar', x) ## which gives me
"i have € and $ and £ and ¥ and \u20b9dollar"

Replacing dollar could be done using this:

gsub([$], dollar, x)

Upvotes: 1

Views: 167

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627292

To view your x with the rupee in it, use cat:

> cat(x, sep="\n")
i have € and $ and £ and ¥ and ₹
> 

To replace the dollar, use a literal string replacement by adding fixed=TRUE (so as not to escape the $ symbol that denotes the end of string in a regex):

> x <- gsub("$", "dollar", x, fixed=TRUE)
> cat(x, sep="\n")
i have € and dollar and £ and ¥ and ₹
> 

When you do not pass fixed=TRUE, sub and gsub parses the "$" as a regex pattern, and in regex, $ denotes the end of string. That is why in your results, dollar is added after the rupee.

Upvotes: 1

Related Questions