ulrich
ulrich

Reputation: 3587

error: unpack_from requires a buffer

I am using struct to format a fixed delimited txt file. Here are the first two rows:

    Sat Jan  3 18:15:05 2009    62e907b15cbf27d5425399ebf6f0fb50ebb88f18    4a5e1e4baab89f3a32518a88c31bc87f618f76673e2cc77ab2127b7afdeda33b +              50.00000000
    Fri Jan  9 02:54:25 2009    119b098e2e980a229e139a9ed01a469e518e6f26    0e3e2357e806b6cdb1f70b54c3a3a17b6714ee1f0e68bebb44a74b1efd512098 +              50.00000000

And convert it into a csv using the following code:

import csv
import struct

fieldwidths = (-4, 24, -4, 40,-4,64,-1,1,25)  # negative widths represent ignored padding fields
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                        for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from

c = csv.writer(open("/home/ulrich/Desktop/disertation/sample_parsed_blch1.csv", "wb"))
with open('/home/ulrich/Desktop/disertation/sample_parsed_blch2.txt') as f:
    for line in f:
        fields = parse(line)
        c.writerow(fields)

It works fine as it produces the csv but I still getting this error message:

error: unpack_from requires a buffer of at least 167 bytes

Upvotes: 4

Views: 7017

Answers (3)

ulrich
ulrich

Reputation: 3587

I ended up adding a condition with regex in order to only parse the rows with the appropriate format

import csv
import struct
import re
pattern = re.compile("\s{4}\w{3}\s{1}.+")


fieldwidths = (-4, 24, -4, 40,-4,64,-1,1,25)  # negative widths represent ignored padding fields
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                        for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from

c = csv.writer(open("/media/ulrich/FC9A-C444/all_tx.csv", "wb"))
with open('/media/ulrich/FC9A-C444/all_tx.txt') as f:
    for line in f:
        if pattern.match(line):
            fields = parse(line)
            c.writerow(fields)

Upvotes: 0

Serge Ballesta
Serge Ballesta

Reputation: 148900

This kind of error can be caused by extra characters at end of file. Some editors from the Windows world are known to add a Ctrl-Z at the end of text files. This is a reminiscence from the time where MS/DOS wanted to keep compatibility with CP/M.

You can easily get rid of it by skipping shorter lines:

for line in f:
    if len(line) >= minsize # 100
        fields = parse(line)
        c.writerow(fields)

Upvotes: 1

falsetru
falsetru

Reputation: 369074

If you iterate the file, line will contian trailing newline. You need to remove that:

....
for line in f:
    fields = parse(line.rstrip('\r\n'))
    c.writerow(fields)

Upvotes: 0

Related Questions