David542
David542

Reputation: 110267

Using csv.Sniffer on a bytes-like object

I have the following code which I'm using to infer the field separator and line terminator in a csv file:

first_line = b'132605,1\r\n'
dialect = csv.Sniffer().sniff(first_line)

From the above, I'd expect the csv Sniffer to be able to infer the separator is , and the line-terminator is \r\n. However it returns the following error:

TypeError: cannot use a string pattern on a bytes-like object

What would be the best way to fix this?

Note, the reason I'm opening it in b mode is so that I can see all characters, for example:

>>> open('10_no_headers.csv','r+b').read()[:10]
b'132605,1\r\n'

>>> open('10_no_headers.csv','r').read()[:10]
'132605,1\n1' # doesn't show the \r

Upvotes: 1

Views: 1080

Answers (2)

buran
buran

Reputation: 14233

Open in 'r' mode and supply newline='':

import csv

with open('foo.txt', 'w') as f:
    f.write('132605,1\r\n')

with open('foo.txt', 'r') as f:
    print(repr(next(f)))

with open('foo.txt', 'rb') as f:
    print(repr(next(f)))

with open('foo.txt', 'r', newline='') as f:
    line = next(f)
    dialect = csv.Sniffer().sniff(line)
    print(repr(line))
    print ('FIELED:', repr(dialect.delimiter), 'LINE:', repr(dialect.lineterminator))

output

'132605,1\n'
b'132605,1\r\n'
'132605,1\r\n'
FIELED: ',' LINE: '\r\n'

from the docs:

newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:

  • When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
  • When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.

Upvotes: 3

David542
David542

Reputation: 110267

One possible option would be to decode it before passing it to the Sniffer. For example:

import csv

first_line = b'132605,1\r\n'
dialect = csv.Sniffer().sniff(first_line.decode('utf-8'))

print ('FIELED:', repr(dialect.delimiter), 'LINE:', repr(dialect.lineterminator))
FIELED: ',' LINE: '\r\n'

Upvotes: 1

Related Questions