EBAH
EBAH

Reputation: 99

pandas field separator and double quotes

I need to load in pandas a CSV file that is not 100% CSV "compliant", below an example:

"Transaction date";"Accounting date";"Counterparty's data";"Title"
2021-08-22;2021-08-22;" "SPOLEM" ASS "ALDONA" AUGUSTOW ";" Title 450"
2019-09-02;2019-09-02;" 13XYZ05 "SKOWRONEK" NIEGOWA ";" Title 1300"
2010-07-18;2010-07-18;" APTEKA "SLOWINSKA" SPOLKALEBA ";" Title 123"

I read this csv file (csv_in) into a pandas data frame with the following command:

df = pd.read_csv(csv_in, \
                 delimiter=';', \
                 engine='python', \
                 quoting=1)

I understand that the bad csv formatting is the culprit, but:

Thanks! Evan

Upvotes: 1

Views: 150

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195418

You can try to clean your data by escaping the inner quotes " by \:

import re
import pandas as pd
from io import StringIO


with open("your_data.csv", "r") as f_in:
    data = f_in.read()

data = re.sub(
    r'"([^;]+)"',
    lambda g: '"' + g.group(1).replace('"', r"\"").strip() + '"',
    data,
)

df = pd.read_csv(StringIO(data), delimiter=";", quotechar='"', escapechar="\\")
print(df)

This prints:

  Transaction date Accounting date             Counterparty's data       Title
0       2021-08-22      2021-08-22  "SPOLEM" ASS "ALDONA" AUGUSTOW   Title 450
1       2019-09-02      2019-09-02     13XYZ05 "SKOWRONEK" NIEGOWA  Title 1300
2       2010-07-18      2010-07-18   APTEKA "SLOWINSKA" SPOLKALEBA   Title 123

Upvotes: 1

Naveed
Naveed

Reputation: 11650

specify the quotechar and remove the engine attributes (resulting in C being default)


pd.read_csv("csv2.txt", \
                 delimiter=';', \
                 quotechar='"')
    Transaction date    Accounting date     Counterparty's data     Title
0   2021-08-22  2021-08-22  SPOLEM" ASS "ALDONA" AUGUSTOW "     Title 450
1   2019-09-02  2019-09-02  13XYZ05 SKOWRONEK" NIEGOWA "    Title 1300
2   2010-07-18  2010-07-18  APTEKA SLOWINSKA" SPOLKALEBA "  Title 123

Upvotes: 1

Related Questions