Reputation: 2096
I know there are many similar questions regarding problems with read.table()
. However I couldn't manage to import the following data sheet into R, which consists of a trading universe with different assets, their descriptions and groups:
https://wikifolio.blob.core.windows.net/prod-documents/Investment_Universe.xlsx
I saved the file as dat.csv
and tried read.table("dat.csv", header=T, sep=";", stringsAsFactors=F, quote="")
with different encodings (e.g., latin1
), but I always get the invalid multibyte string error. Moreover, I tried to find and replace all 'ä', 'ö' and 'ü' using my text editor.
How can I find the rows, where the errors occur when reading the table into R? At the moment, I don't know where to look for the strings causing the problem.
Thanks!
Upvotes: 3
Views: 2808
Reputation: 83215
The problem is in one of your column names which contains the ü
character. Use check.names = FALSE
in your read.csv2
:
dat <- read.csv2("dat.csv", check.names = FALSE)
this will read you file correctly:
> head(dat)
ISIN WKN SecurityType Bezeichnung Anlageuniversum (Gruppe) Anlageuniversum Whitelist f\x81r institutionelle Produkte _ Schweiz
1 AN8068571086 853390 Stock SCHLUMBERGER Aktien Europa Aktien Europa Select X
2 AT000000STR1 A0M23V Stock STRABAG Aktien Europa Aktien Europa Select X
3 AT00000AMAG3 A1JFYU Stock AMAG AUSTRIA METALL AG Aktien Europa Aktien Europa Select X
4 AT00000ATEC9 A0LFDH Stock A-TEC INDUSTRIES Aktien Europa Aktien Europa Select X
5 AT00000BENE6 A0LCPZ Stock BENE AG Aktien Europa Aktien Europa Select X
6 AT00000FACC2 A1147K Stock FACC AG Aktien Europa Aktien Europa Select X
Then you can change your column names with for example:
names(dat) <- c("ISIN","WKN","SecurityType","Bezeichnung",
"Anlageuniversum_Gruppe","Anlageuniversum","Whitelist_Schweiz")
Another possibility is reading your file without the headers:
dat <- read.csv2("dat.csv", header = FALSE, skip = 1)
Upvotes: 2