jeffrey
jeffrey

Reputation: 2096

Read.table() invalid multibyte string error: Find the strings causing the error

I know there are many similar questions regarding problems with read.table(). However I couldn't manage to import the following data sheet into R, which consists of a trading universe with different assets, their descriptions and groups:

https://wikifolio.blob.core.windows.net/prod-documents/Investment_Universe.xlsx

I saved the file as dat.csv and tried read.table("dat.csv", header=T, sep=";", stringsAsFactors=F, quote="") with different encodings (e.g., latin1), but I always get the invalid multibyte string error. Moreover, I tried to find and replace all 'ä', 'ö' and 'ü' using my text editor.

How can I find the rows, where the errors occur when reading the table into R? At the moment, I don't know where to look for the strings causing the problem.

Thanks!

Upvotes: 3

Views: 2808

Answers (1)

Jaap
Jaap

Reputation: 83215

The problem is in one of your column names which contains the ü character. Use check.names = FALSE in your read.csv2:

 dat <- read.csv2("dat.csv", check.names = FALSE)

this will read you file correctly:

> head(dat)
          ISIN    WKN SecurityType            Bezeichnung Anlageuniversum (Gruppe)      Anlageuniversum Whitelist f\x81r institutionelle Produkte _ Schweiz
1 AN8068571086 853390        Stock           SCHLUMBERGER            Aktien Europa Aktien Europa Select                                                   X
2 AT000000STR1 A0M23V        Stock                STRABAG            Aktien Europa Aktien Europa Select                                                   X
3 AT00000AMAG3 A1JFYU        Stock AMAG AUSTRIA METALL AG            Aktien Europa Aktien Europa Select                                                   X
4 AT00000ATEC9 A0LFDH        Stock       A-TEC INDUSTRIES            Aktien Europa Aktien Europa Select                                                   X
5 AT00000BENE6 A0LCPZ        Stock                BENE AG            Aktien Europa Aktien Europa Select                                                   X
6 AT00000FACC2 A1147K        Stock                FACC AG            Aktien Europa Aktien Europa Select                                                   X

Then you can change your column names with for example:

names(dat) <- c("ISIN","WKN","SecurityType","Bezeichnung",
                "Anlageuniversum_Gruppe","Anlageuniversum","Whitelist_Schweiz")

Another possibility is reading your file without the headers:

dat <- read.csv2("dat.csv", header = FALSE, skip = 1)

Upvotes: 2

Related Questions