Reputation: 311
When using the configuration for automatic separator detection to read csv files (pd.read_csv(file_path, sep=None)
), pandas tries to infer the delimiter (or separator).
Is there a way to retrieve the result of this inference (the value that was finally used for sep
)?
EDIT
I am looking specifically for a method that uses the pandas object that is returned by read_csv
. I use version 0.20.2 of pandas.
Upvotes: 19
Views: 15604
Reputation: 402813
If all you want to do is detect the dialect of a csv (without loading in your data), you can use the inbuilt csv.Sniffer
standard:
The Sniffer class is used to deduce the format of a CSV file.
In particular, the sniff
method:
sniff(sample, delimiters=None)
Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.
Here's an example of its usage:
with open('example.csv', 'r') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.readline())
print(dialect.delimiter)
Upvotes: 10
Reputation: 457
based on RHSmith159 answer
you can do it like this. you have to concat
after reading the csv to prevent the dataframe from converting its type into TextFileReader
object. you might also need to specify the engine into python to avoid parser warning like below code
reader = pd.read_csv(filename, sep = None, iterator = True, engine='python')
df = pd.concat(reader)
delimiter = reader._engine.data.dialect.delimiter
print(delimiter)
print(df)
Upvotes: -1
Reputation: 1592
I think you can do this without having to import csv
:
reader = pd.read_csv(file_path, sep = None, iterator = True)
inferred_sep = reader._engine.data.dialect.delimiter
EDIT:
Forgot the iterator = True
argument.
Upvotes: 14
Reputation: 21676
csv.Sniffer
The Sniffer class is used to deduce the format of a CSV file.
sniff(sample, delimiters=None)
Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.
Dialect.delimiter
A one-character string used to separate fields. It defaults to ','
import csv
sniffer = csv.Sniffer()
dialect = sniffer.sniff('first, second, third, fourth')
print dialect.delimiter
Upvotes: 1