Donbeo
Donbeo

Reputation: 17647

Error while reading csv file in R

I am having some problems in reading a csv file with R.

 x=read.csv("LorenzoFerrone.csv",header=T)

Error in make.names(col.names, unique = TRUE) : 
      invalid multibyte string at '<ff><fe>N'

I can read the file using libre office with no problems.

I can not upload the file because it is full of sensible information.

What can I do?


Setting encoding seem like the solution to the problem.

> x=read.csv("LorenzoFerrone.csv",fileEncoding = "UCS-2LE")
> x[2,1]
[1] Adriano Caruso
100 Levels:  Ada Adriano Caruso adriano diaz Adriano Diaz alberto ferrone Alexey ... Zia Tina

Upvotes: 24

Views: 123922

Answers (11)

Antarqui
Antarqui

Reputation: 497

You can always use the "Latin1" encoding while reading the csv:

 x = read.csv("LorenzoFerrone.csv", fileEncoding = "Latin1", check.names = F)

I am adding check.names = F to avoid replacing spaces by dots within your header.

As the above not always work, another option is to use the data.table library along with Latin-1 encoding:

library(data.table) 

x = fred("LorenzoFerrone.csv", encoding = "Latin-1")

Upvotes: 9

VISHAL TIWARI
VISHAL TIWARI

Reputation: 41

Change the file format to - CSV UTF-8. It worked for me.

Upvotes: 4

Charlie Lee
Charlie Lee

Reputation: 51

I found this problem is caused by code of file, and I solved that by opening it with Windows note, saving with UTF-8, and reopening with Excel(it became garbled at first), and resaving with UTF-8, then it worked!

Upvotes: 5

Balamurali N.R
Balamurali N.R

Reputation: 353

This will read the column names as-is and won't return any errors:

x = read.csv(check.names = F)

To remove/replace troublesome characters in column names, use this:

iconv(names(x), to = "ASCII", sub = "")

Upvotes: 22

pkpkPPkafa
pkpkPPkafa

Reputation: 13

I know this is an old post, but just wanted to say to non-English natives, that if you use "," as decimal seperator,

Upvotes: 0

Mariano
Mariano

Reputation: 1

I solved the problem by removing any graphical signs in the writing (i.e. accent marks). My headers were written in Spanish and had some accent marks in there. I replaced with simple words (México=Mexico) and problem was solved.

Upvotes: 0

MarcoD
MarcoD

Reputation: 137

Not sure if this helps, just had a similar issue which I solved by removing " from the csv I was trying to import. The first row of the database had the column names written as "colname","colname2","etc" and I removed all the " and the csv was read in R just fine then.

Upvotes: 0

Sanjay Kulkarni
Sanjay Kulkarni

Reputation: 36

You need to specify the correct delimiter in the sep argument.

Upvotes: 2

AlonG
AlonG

Reputation: 69

Typically an encoding issue. You can try to change encoding or else deleting the offending character (just use your favorite editor and replace all instances). In some cases R will spit the char location, for example:

invalid multibyte string 1847

Which should make your life easier. Also note that you may be required to repeat this process several times (deleting all offending characters or trying several encodings).

Upvotes: 1

Antonello Salis
Antonello Salis

Reputation: 166

The cause is an invalid encoding. I have solved replacing all the "è" with e

Upvotes: 8

fredtal
fredtal

Reputation: 571

Not sure if this is helpful, but I had a similar problem and figured out that it was because my "csv" file had a .csv suffix, but was actually a .xls file!

Upvotes: 0

Related Questions