Reputation: 110267
I have the following code which I'm using to infer the field separator and line terminator in a csv file:
first_line = b'132605,1\r\n'
dialect = csv.Sniffer().sniff(first_line)
From the above, I'd expect the csv Sniffer to be able to infer the separator is ,
and the line-terminator is \r\n
. However it returns the following error:
TypeError: cannot use a string pattern on a bytes-like object
What would be the best way to fix this?
Note, the reason I'm opening it in b
mode is so that I can see all characters, for example:
>>> open('10_no_headers.csv','r+b').read()[:10]
b'132605,1\r\n'
>>> open('10_no_headers.csv','r').read()[:10]
'132605,1\n1' # doesn't show the \r
Upvotes: 1
Views: 1080
Reputation: 14233
Open in 'r' mode and supply newline=''
:
import csv
with open('foo.txt', 'w') as f:
f.write('132605,1\r\n')
with open('foo.txt', 'r') as f:
print(repr(next(f)))
with open('foo.txt', 'rb') as f:
print(repr(next(f)))
with open('foo.txt', 'r', newline='') as f:
line = next(f)
dialect = csv.Sniffer().sniff(line)
print(repr(line))
print ('FIELED:', repr(dialect.delimiter), 'LINE:', repr(dialect.lineterminator))
output
'132605,1\n'
b'132605,1\r\n'
'132605,1\r\n'
FIELED: ',' LINE: '\r\n'
newline controls how universal newlines mode works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. It works as follows:
- When reading input from the stream, if newline is None, universal newlines mode is enabled. Lines in the input can end in '\n', '\r', or '\r\n', and these are translated into '\n' before being returned to the caller. If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated.
- When writing output to the stream, if newline is None, any '\n' characters written are translated to the system default line separator, os.linesep. If newline is '' or '\n', no translation takes place. If newline is any of the other legal values, any '\n' characters written are translated to the given string.
Upvotes: 3
Reputation: 110267
One possible option would be to decode it before passing it to the Sniffer
. For example:
import csv
first_line = b'132605,1\r\n'
dialect = csv.Sniffer().sniff(first_line.decode('utf-8'))
print ('FIELED:', repr(dialect.delimiter), 'LINE:', repr(dialect.lineterminator))
FIELED: ',' LINE: '\r\n'
Upvotes: 1