Adi Lavy
Adi Lavy

Reputation: 145

Parse multiple tables of different sizes from a single csv file

I have a CSV file that contains multiple tables. Each table has a title, and variable number of rows and columns (these numbers may vary between files). The titles, as well as names of rows and columns may also change between different files I will need to parse in the future, so I cannot hardcode them. some columns may contain empty cells as well.

Here is a screenshot of an example CSV file with this structure: enter image description here

I need to find a solution that will parse all the tables from the CSV into Pandas DFs. Ideally the final output would be an Excel file, where each table is saved as a sheet, and the name of each sheet will be the corresponding table title.

I tried the suggested solution in this post but it kept failing in identifying the start/end of the tables. When I used a simpler version of the input csv file, the suggested code only returned one table.

I would appreciate any assistance!!

Upvotes: 1

Views: 1971

Answers (1)

Laurent
Laurent

Reputation: 13518

You could try this:

df = pd.read_csv("file.csv")

dfs = []
start = 0
for i, row in df.iterrows():
    if all(row.isna()):  # Empty row
        # Remove empty columns
        temp_df = df.loc[start:i, :].dropna(how="all", axis=1)
        if start:  # Grab header, except for first df
            new_header = temp_df.iloc[0]
            temp_df = temp_df[1:]
            temp_df.columns = new_header
        temp_df = temp_df.dropna(how="all", axis=0)
        dfs.append(temp_df)
        start = i + 1

Then, you can reach each df by calling dfs[0], dfs[1], ...

Upvotes: 1

Related Questions