Reputation: 15
I am parsing an Apache log file and saving it into pandas data frame for my further investigation.
But in the log file I have some bad lines and so the following error occurs:
ValueError: Expected 11 fields in line 4320, saw 27
To overcome this issue, I included error_bad_lines = False
while reading the file. This doesn't help as I am getting the following error:
ValueError: The 'error_bad_lines' option is not supported with the 'python' engine
Note : I am explicitly using python engine
as I have separator as a regular expression.
Code snippet:
data = pd.read_csv(
log_file,
sep=r'\s(?=(?:[^"]*"[^"]*")*[^"]*$)(?![^\[]*\])',
engine='python',
na_values='-',
header=None,
usecols = use_cols,
skiprows =1,
converters={time_taken_index[0]:parse_sec, time_index[0]:parse_datetime, req_index[0]:parse_str,status_index[0]:parse_str},
error_bad_lines = False
)
I'd be grateful for any suggestions. Thank you.
Upvotes: 0
Views: 10072
Reputation: 2072
It seems that you are using an old version of Pandas (<= 0.19.0).
The parameter error_bad_lines = False
will work with the python engine in Pandas 0.20.0+.
So, just update the Pandas library.
Upvotes: 1