J. Grünenwald
J. Grünenwald

Reputation: 115

'invalid multibyte character' when reading text file

For an internal project we have to read some extremely large files (>2.5GB). We used to read these files by calling

laf_open(detect_dm_csv(filename = file_path))

and then looping over each line, reading substrings, and manually creating a dataframe from those. For some files, this works just fine, but for others, this method has failed:

invalid multibyte string at '<e4> '

When looking at the files, I can see that each row contains this multibyte character in the very same position (there is always a spot where the string shows \xe4). My hypothesis is that the files are being read in the wrong encoding. But I don't see how detect_dm_csv allows one to choose the encoding to use. Any ideas how I can fix this?

Upvotes: 0

Views: 789

Answers (1)

Antreas Stefopoulos
Antreas Stefopoulos

Reputation: 282

reading csv with encoding parameters can be like:

df<- read.csv(file_path, 
                encoding = "iso-8859-1", 
                header = TRUE, 
                stringsAsFactors = FALSE)

Upvotes: 1

Related Questions