Reputation: 541
I have 2 issues related to Swedish characters. I am fetching data directly from MS SQL database. 1.could anyone gives me a hint how could i change the back to Swedish characters in R?
I use write.csv write the data out to csv then copy and paste those string here to make the df as follow
library(tidyverse)
library(ggplot2)
library(scales)
c <- c("c","u","m","j","c","u","m","j","c","u","m","j")
city <- c("G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping")
priority <- c(1,1,1,1,0,0,0,0,2,3,3,2)
n_cust <- sample(50:1000, 12, replace=T)
df <- data.frame(c,city,priority,n_cust)
should be ö and is å
dpri %>% group_by(kommun, artikel_prioritet) %>% summarise(n_cust=n_distinct(kund_id), sum_sales=sum(p_sum_adj_sale), avg_margin=mean(pp_avg_margin), avg_pec_sales=mean(p_pec_sales)) %>% arrange(desc(sum_sales)) %>% head(20)%>% ggplot(aes(x=reorder(kommun, sum_sales), y=sum_sales, fill=factor(artikel_prioritet))) + geom_bar(stat='identity')+ coord_flip()+ scale_y_continuous(labels = comma)+ facet_grid(.~ factor(artikel_prioritet), scales = "free")+ theme(legend.position="none")
i got this error: Error in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : invalid input 'Göteborg' in 'utf8towcs'
if I first put this head(20) into a variable ci. then use ggplot to plot ci
ggplot(ci,aes(x=reorder(kommun, sum_sales), y=sum_sales, fill=factor(artikel_prioritet))) + geom_bar(stat='identity')+
coord_flip()+ scale_y_continuous(labels = comma)+ facet_grid(.~ factor(artikel_prioritet), scales = "free")+
theme(legend.position="none")
I have bar chart without any city legend. then I print out ci, I got pic as follow:
then, I write the head(20) to a csv 'cityname.csv' then read.csv back to R use the same code to do the bar chart
ci <- read.csv("cityname.csv")
ggplot(ci,aes(x=reorder(kommun, sum_sales), y=sum_sales, fill=factor(artikel_prioritet))) + geom_bar(stat='identity')+
coord_flip()+ scale_y_continuous(labels = comma)+ facet_grid(.~ factor(artikel_prioritet), scales = "free")+
theme(legend.position="none")
we can see legends this time but see , this time. hope get some suggestions how could i fix the strings in Swedish and wondering suggestion is there any other way without write.csv and then read again still can get the bar chart fixed?
Thank you!
Upvotes: 2
Views: 273
Reputation: 1509
I believe your issue is that R doesn't know how to interpret your character encoding. Try \u
notation instead of <>
, which denotes UTF-8 encoding in R
> city <- c("G\u00f6teborg", "Ume\u00e5", "Malm\u00f6", "J\u00f6nk\u00f6ping","G\u00f6teborg", "Ume\u00e5", "Malm\u00f6", "J\u00f6nk\u00f6ping","G\u00f6teborg", "Ume\u00f6", "Malm\u00f6", "J\u00f6nk\u00f6ping")
> Encoding(city)
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
> head(city)
[1] "Göteborg" "Umeå" "Malmö" "Jönköping" "Göteborg" "Umeå"
EDIT:
You asked a good follow up question about how to make this replacement programmatically. I have provided a solution for that as well below, using the tidyverse
packages dplyr
and stringr
> city <- c("G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<e5>", "Malm<f6>", "J<f6>nk<f6>ping","G<f6>teborg", "Ume<f6>", "Malm<f6>", "J<f6>nk<f6>ping")
> city_df <- as.data.frame(city)
> special_character_replacements <- c("<f6>" = "\\u00f6", "<e5>" = "\\u00e5")
> city_df %>%
dplyr::mutate(city_fixed =
stringr::str_replace_all(city, special_character_replacements))
city city_fixed
1 G<f6>teborg Göteborg
2 Ume<e5> Umeå
3 Malm<f6> Malmö
4 J<f6>nk<f6>ping Jönköping
5 G<f6>teborg Göteborg
6 Ume<e5> Umeå
7 Malm<f6> Malmö
8 J<f6>nk<f6>ping Jönköping
9 G<f6>teborg Göteborg
10 Ume<f6> Umeö
11 Malm<f6> Malmö
12 J<f6>nk<f6>ping Jönköping
Upvotes: 0