'invalid multibyte character' when reading text file

Question

For an internal project we have to read some extremely large files (>2.5GB). We used to read these files by calling

laf_open(detect_dm_csv(filename = file_path))

and then looping over each line, reading substrings, and manually creating a dataframe from those. For some files, this works just fine, but for others, this method has failed:

invalid multibyte string at ' '

When looking at the files, I can see that each row contains this multibyte character in the very same position (there is always a spot where the string shows \xe4). My hypothesis is that the files are being read in the wrong encoding. But I don't see how detect_dm_csv allows one to choose the encoding to use. Any ideas how I can fix this?

'invalid multibyte character' when reading text file

Answers (1)

Related Questions

&#39;invalid multibyte character&#39; when reading text file

Answers (1)

Related Questions

'invalid multibyte character' when reading text file