Reputation: 17647
I am having some problems in reading a csv file with R.
x=read.csv("LorenzoFerrone.csv",header=T)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>N'
I can read the file using libre office with no problems.
I can not upload the file because it is full of sensible information.
What can I do?
Setting encoding seem like the solution to the problem.
> x=read.csv("LorenzoFerrone.csv",fileEncoding = "UCS-2LE")
> x[2,1]
[1] Adriano Caruso
100 Levels: Ada Adriano Caruso adriano diaz Adriano Diaz alberto ferrone Alexey ... Zia Tina
Upvotes: 24
Views: 123922
Reputation: 497
You can always use the "Latin1" encoding while reading the csv:
x = read.csv("LorenzoFerrone.csv", fileEncoding = "Latin1", check.names = F)
I am adding check.names = F
to avoid replacing spaces by dots within your header.
As the above not always work, another option is to use the data.table
library along with Latin-1
encoding:
library(data.table)
x = fred("LorenzoFerrone.csv", encoding = "Latin-1")
Upvotes: 9
Reputation: 51
I found this problem is caused by code of file, and I solved that by opening it with Windows note, saving with UTF-8, and reopening with Excel(it became garbled at first), and resaving with UTF-8, then it worked!
Upvotes: 5
Reputation: 353
This will read the column names as-is and won't return any errors:
x = read.csv(check.names = F)
To remove/replace troublesome characters in column names, use this:
iconv(names(x), to = "ASCII", sub = "")
Upvotes: 22
Reputation: 13
I know this is an old post, but just wanted to say to non-English natives, that if you use "," as decimal seperator,
Upvotes: 0
Reputation: 1
I solved the problem by removing any graphical signs in the writing (i.e. accent marks). My headers were written in Spanish and had some accent marks in there. I replaced with simple words (México=Mexico) and problem was solved.
Upvotes: 0
Reputation: 137
Not sure if this helps, just had a similar issue which I solved by removing " from the csv I was trying to import. The first row of the database had the column names written as "colname","colname2","etc" and I removed all the " and the csv was read in R just fine then.
Upvotes: 0
Reputation: 36
You need to specify the correct delimiter in the sep
argument.
Upvotes: 2
Reputation: 69
Typically an encoding issue. You can try to change encoding or else deleting the offending character (just use your favorite editor and replace all instances). In some cases R will spit the char location, for example:
invalid multibyte string 1847
Which should make your life easier. Also note that you may be required to repeat this process several times (deleting all offending characters or trying several encodings).
Upvotes: 1
Reputation: 166
The cause is an invalid encoding. I have solved replacing all the "è" with e
Upvotes: 8
Reputation: 571
Not sure if this is helpful, but I had a similar problem and figured out that it was because my "csv" file had a .csv suffix, but was actually a .xls file!
Upvotes: 0