Reputation: 515
I want to parse an nginx access log using pandas python library read_csv function. I'm using the following code:
pd.read_csv('lb-access_cache.log', delim_whitespace=True, quotechar='"')
It would be possible to specify more than one quotechar, to treat also the elements inside brackets or square brackets as columns?
For example, in an string like the following I want to obtain 3 columns.
hello "world hello" [world is beautifull]
Upvotes: 2
Views: 1159
Reputation: 2847
This will do, you need to use a regex in place of sep:
df = pd.read_csv(log_file,
sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
engine='python',
usecols=[0, 3, 4, 5, 6, 7, 8],
names=['ip', 'time', 'request', 'status', 'size', 'referer', 'user_agent'],
na_values='-',
header=None
)
Upvotes: 3