Reputation: 84
I’m trying to read an unknown large csv file with pandas. I came across some errors so I added the following arguments:
df = pd.read_csv(csv_file, engine="python", error_bad_lines=False, warn_bad_lines=True)
It is working good and skipping offending lines, and errors are prompted to the terminal correctly, such as:
Skipping line 31175: field larger than field limit (131072)
However, I’d like to save all errors to a variable instead of printing them. How can I do it?
Note that I have a big program here and can't change the output of all logs from file=sys.stdout
to something else. I need a case specific solution.
Thanks!
Upvotes: 2
Views: 640
Reputation: 24603
use on_bad_lines
capability instead (available in pandas 1.4+):
badlines_list = []
def badlines_collect (bad_line: list[str]) -> None:
badlines_list.append(bad_line)
return None
df = pd.read_csv(csv_file, engine="python",on_bad_lines=badlines_collect)
Upvotes: 1