Reputation: 4991
I am using the following code for importing special characters in R:
Encoding(self$Data$Skills) <- "UTF-8"
But when I change the name of the column with:
colnames(self$Data) <- 'skills2'
and run again:
Encoding(self$Data$skills2) <- "UTF-8"
I have the following error:
Error in `Encoding<-`(`*tmp*`, value = "UTF-8") :
a character vector argument expected
I do not understand why is this happening. Any idea? Additionally, the same is happening if I want to sample data from this dataframe. Using:
self$Data <- data.frame(df[sample(nrow(self$Data),dim(self$Data)[1]*samplePersentance),])
the column name changes and when i encoding function i got the same error.The data is imported using read.csv
function.
Edit: Head of the data
Skills
1 null
2 "'"
3 "'Fin Gaap'"
4 "'Knæ-igennem-hinanden-tr..."
5 "'Mønt-dans-på-knoerne-tr..."
6 "'Necessary knowledge of..."
> typeof(self$Data)
[1] "list"
> class(self$Data)
[1] "data.frame"
And to reproduce the error:
try1 <- structure(list(Skills = c("null", "\"'\"", "\"'Fin Gaap'\"",
"\"'Knæ-igennem-hinanden-tr...\"", "\"'Mønt-dans-på-knoerne-tr...\"",
"\"'Necessary knowledge of...\"")), .Names = "Skills", row.names = c(NA,
6L), class = "data.frame")
Encoding(try1$Skills) <- 'UTF-8'
#the function runs normally
try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
Encoding(try2$skills2) <- 'UTF-8'
#the function output an error.
> typeof(try1$skills)
'character'
> typeof(try2$skills)
'intiger'
Upvotes: 1
Views: 7166
Reputation: 132576
The problem is that data.frame
with its default stringsAsFactors = TRUE
turns the column into a factor:
try2 <- data.frame(try1[sample(nrow(try1),floor(dim(try1)[1]*0.5)),])
colnames(try2) <- 'skills2'
#'data.frame': 3 obs. of 1 variable:
# $ skills2: Factor w/ 3 levels "\"'\"","\"'Fin Gaap'\"",..: 3 1 2
str(try2)
Encoding(try2$skills2) <- 'UTF-8'
#Error in `Encoding<-`(`*tmp*`, value = "UTF-8") :
# a character vector argument expected
try2$skills2 <-as.character(try2$skills2)
Encoding(try2$skills2) <- 'UTF-8'
#works
Of course you don't need data.frame
in that line at all ...
Upvotes: 2