How to parse a complex text file using Python string methods or regex and export into tabular form

Question

As the title mentions, my issue is that I don't understand quite how to extract the data I need for my table (The columns for the table I need are Date, Time, Courtroom, File Number, Defendant Name, Attorney, Bond, Charge, etc.)

I think regex is what I need but my class did not go over this, so I am confused on how to parse in order to extract and output the correct data into an organized table...

I am supposed to turn my text file from this

https://pastebin.com/ZM8EPu0p

and export it into a more readable format like this- example output is below

Here is what I have so far.

def readFile(court):
    csv_rows = []
    # read and split txt file into pages & chunks of data by pagragraph
    with open(court, "r") as file:
        data_chunks = file.read().split("

")

        for chunk in data_chunks:
            chunk = chunk.strip  # .strip removes useless spaces
            if str(data_chunks[:4]).isnumeric():  # if first 4 characters are digits
                entry = None  # initialize an empty dictionary
            elif (
                str(data_chunks).isspace() and entry
            ):  # if we're on an empty line and the entry dict is not empty
                csv_rows.DictWriter(dialect="excel")  # turn csv_rows into needed output
                entry = {}
            else:

                # parse here?

                print(data_chunks)

    return csv_rows

readFile("/Users/mia/Desktop/School/programming/court.txt")

How to parse a complex text file using Python string methods or regex and export into tabular form

Answers (1)

Related Questions