Reputation: 145
I have a CSV file that contains multiple tables. Each table has a title, and variable number of rows and columns (these numbers may vary between files). The titles, as well as names of rows and columns may also change between different files I will need to parse in the future, so I cannot hardcode them. some columns may contain empty cells as well.
Here is a screenshot of an example CSV file with this structure:
I need to find a solution that will parse all the tables from the CSV into Pandas DFs. Ideally the final output would be an Excel file, where each table is saved as a sheet, and the name of each sheet will be the corresponding table title.
I tried the suggested solution in this post but it kept failing in identifying the start/end of the tables. When I used a simpler version of the input csv file, the suggested code only returned one table.
I would appreciate any assistance!!
Upvotes: 1
Views: 1971
Reputation: 13518
You could try this:
df = pd.read_csv("file.csv")
dfs = []
start = 0
for i, row in df.iterrows():
if all(row.isna()): # Empty row
# Remove empty columns
temp_df = df.loc[start:i, :].dropna(how="all", axis=1)
if start: # Grab header, except for first df
new_header = temp_df.iloc[0]
temp_df = temp_df[1:]
temp_df.columns = new_header
temp_df = temp_df.dropna(how="all", axis=0)
dfs.append(temp_df)
start = i + 1
Then, you can reach each df by calling dfs[0]
, dfs[1]
, ...
Upvotes: 1