Toni Piza
Toni Piza

Reputation: 515

Multiple quotechars in pandas

I want to parse an nginx access log using pandas python library read_csv function. I'm using the following code:

pd.read_csv('lb-access_cache.log', delim_whitespace=True, quotechar='"')

It would be possible to specify more than one quotechar, to treat also the elements inside brackets or square brackets as columns?

For example, in an string like the following I want to obtain 3 columns.

hello "world hello" [world is beautifull]

Upvotes: 2

Views: 1159

Answers (1)

SerialDev
SerialDev

Reputation: 2847

This will do, you need to use a regex in place of sep:

df = pd.read_csv(log_file,
              sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
              engine='python',
              usecols=[0, 3, 4, 5, 6, 7, 8],
              names=['ip', 'time', 'request', 'status', 'size', 'referer', 'user_agent'],
              na_values='-',
              header=None
                )

Upvotes: 3

Related Questions