Itayst
Itayst

Reputation: 84

Save errors to a variable while reading csv file

I’m trying to read an unknown large csv file with pandas. I came across some errors so I added the following arguments:

df = pd.read_csv(csv_file, engine="python", error_bad_lines=False, warn_bad_lines=True)

It is working good and skipping offending lines, and errors are prompted to the terminal correctly, such as:

Skipping line 31175: field larger than field limit (131072)

However, I’d like to save all errors to a variable instead of printing them. How can I do it?

Note that I have a big program here and can't change the output of all logs from file=sys.stdout to something else. I need a case specific solution.

Thanks!

Upvotes: 2

Views: 640

Answers (1)

eshirvana
eshirvana

Reputation: 24603

use on_bad_lines capability instead (available in pandas 1.4+):

badlines_list = []
def badlines_collect (bad_line: list[str]) -> None:
        badlines_list.append(bad_line)
        return None

df = pd.read_csv(csv_file, engine="python",on_bad_lines=badlines_collect)
   

Upvotes: 1

Related Questions