Reputation: 115
For an internal project we have to read some extremely large files (>2.5GB). We used to read these files by calling
laf_open(detect_dm_csv(filename = file_path))
and then looping over each line, reading substrings, and manually creating a dataframe from those. For some files, this works just fine, but for others, this method has failed:
invalid multibyte string at '<e4> '
When looking at the files, I can see that each row contains this multibyte character in the very same position (there is always a spot where the string shows \xe4
). My hypothesis is that the files are being read in the wrong encoding. But I don't see how detect_dm_csv
allows one to choose the encoding to use. Any ideas how I can fix this?
Upvotes: 0
Views: 789
Reputation: 282
reading csv with encoding parameters can be like:
df<- read.csv(file_path,
encoding = "iso-8859-1",
header = TRUE,
stringsAsFactors = FALSE)
Upvotes: 1