Leo Bouloc
Leo Bouloc

Reputation: 311

Retrieve delimiter infered by read_csv in pandas

When using the configuration for automatic separator detection to read csv files (pd.read_csv(file_path, sep=None)), pandas tries to infer the delimiter (or separator).

Is there a way to retrieve the result of this inference (the value that was finally used for sep)?

EDIT

I am looking specifically for a method that uses the pandas object that is returned by read_csv. I use version 0.20.2 of pandas.

Upvotes: 19

Views: 15604

Answers (4)

cs95
cs95

Reputation: 402813

If all you want to do is detect the dialect of a csv (without loading in your data), you can use the inbuilt csv.Sniffer standard:

The Sniffer class is used to deduce the format of a CSV file.

In particular, the sniff method:

sniff(sample, delimiters=None)

Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.

Here's an example of its usage:

with open('example.csv', 'r') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.readline())
    print(dialect.delimiter)

Upvotes: 10

greendino
greendino

Reputation: 457

based on RHSmith159 answer you can do it like this. you have to concat after reading the csv to prevent the dataframe from converting its type into TextFileReader object. you might also need to specify the engine into python to avoid parser warning like below code

    reader = pd.read_csv(filename, sep = None, iterator = True, engine='python')
    df = pd.concat(reader)
    delimiter = reader._engine.data.dialect.delimiter
    print(delimiter)
    print(df)

Upvotes: -1

RHSmith159
RHSmith159

Reputation: 1592

I think you can do this without having to import csv:

reader = pd.read_csv(file_path, sep = None, iterator = True)
inferred_sep = reader._engine.data.dialect.delimiter

EDIT:

Forgot the iterator = True argument.

Upvotes: 14

Chankey Pathak
Chankey Pathak

Reputation: 21676

csv.Sniffer

The Sniffer class is used to deduce the format of a CSV file.

sniff(sample, delimiters=None)

Analyze the given sample and return a Dialect subclass reflecting the parameters found. If the optional delimiters parameter is given, it is interpreted as a string containing possible valid delimiter characters.


Dialect.delimiter

A one-character string used to separate fields. It defaults to ','

import csv

sniffer = csv.Sniffer()
dialect = sniffer.sniff('first, second, third, fourth')
print dialect.delimiter

Upvotes: 1

Related Questions