JasonZhao
JasonZhao

Reputation: 24

How to read data from a big txt file in dart

When i read data from a big txt file block by block ,I got the error as blow:

Unfinished UTF-8 octet sequence (at offset 4096) code:

File file = File(path!);
RandomAccessFile _raf = await file.open();
_raf.setPositionSync(skip ?? 0);
var data = _raf.readSync(block);// block = 64*64 
content.value = utf8.decode(data.toList());

Upvotes: 0

Views: 1300

Answers (1)

Chart Chuo
Chart Chuo

Reputation: 44

UTF*8 is variable length encoding. The error come from data not align to UTF8 boundary Alternative way is to trim data byte on left and right before call utf.decode This will lost first and last character. You may read and add more bytes to cover last character and align with utf8 boundary

bool isDataByte(int i) {
  return i & 0xc0 == 0x80;
}

Future<void> main(List<String> arguments) async {
  var _raf = await File('utf8.txt').open();
    _raf.setPositionSync(skip);
    var data = _raf.readSync(8 * 8);

    var utfData = data.toList();
    int l, r;
    for (l = 0; isDataByte(utfData[l]) && l < utfData.length; l++) {}

    for (r = utfData.length - 1; isDataByte(utfData[r]) && r > l; r--) {}
    var value = utf8.decode(utfData.sublist(l, r));
    print(value);
}

Optional read more 4 bytes and expand to cover last character


bool isDataByte(int i) {
  return i & 0xc0 == 0x80;
}

Future<void> main(List<String> arguments) async {
  var _raf = await File('utf8.txt').open();
    _raf.setPositionSync(skip);
    var block = 8 * 8;
    var data = _raf.readSync(block + 4);

    var utfData = data.toList();
    int l, r;
    for (l = 0; isDataByte(utfData[l]) && l < block; l++) {}

    for (r = block; isDataByte(utfData[r]) && r < block + 4; r++) {}

    var value = utf8.decode(utfData.sublist(l, r));
    print(value);
}

Upvotes: 1

Related Questions