user785099
user785099

Reputation: 5563

Regarding reading files which contain UTF-8 character

I have a csv file including chinese character saved with UTF-8.

项目 价格 电视 5000

The first row is header, the second row is data. In other words, it is one by two vector.

I read this the file as follows:

amatrix<-read.table("test.csv",encoding="UTF-8",sep=",",header=T,row.names=NULL,stringsAsFactors=FALSE)

However, the output including the unknown marks for the header, i.e.,X.U.FEFF

enter image description here

Upvotes: 0

Views: 721

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57696

That is the byte order mark sometimes found in Unicode text files. I'm guessing you're on Windows, since that's the only popular OS where files can end up with them.

What you can do is read the file using readLines and remove the first two characters of the first line.

txt <- readLines("test.csv", encoding="UTF-8")
txt[1] <- substr(txt[1], 3, nchar(txt[1]))
amatrix <- read.csv(text=txt, ...)

Upvotes: 1

Related Questions