reading tables and usual cell data in excel separately using openpyxl

Question

I am extracting excel sheet data using python's openpyxl library .I can easily extract data but the problem is i want the 'insert table ' data which is present in special tables to be extracted separately and the other non table cell data to be extracted separately (means it ignores the table cells and only extracts the cell data)

I can read tables python using openpyxl

code :

import pandas as pd
from openpyxl import load_workbook

# Load the workbook
filename = "/Users/snikesh/Desktop/untitled folder/Input.xlsx"
wb = load_workbook(filename)

# Iterate over all sheets in the workbook
for sheet_name in wb.sheetnames:
    ws = wb[sheet_name]  # Get the sheet by name


    # Check if there are tables on the sheet
    if ws.tables:
        # Iterate over the tables and print them as dataframes
        for table in ws.tables.values():
            data = ws[table.ref]  # Get the range covered by the table
            rows = [[cell.value for cell in row] for row in data]  # Extract the data into rows of cell values
            df = pd.DataFrame(rows)  # Create a DataFrame, using the first row for column headers

            print(df)
    else:
        print("No tables found on this sheet.")

and the cell data(along with table data ) is extracted by simple python code :

import pandas as pd
from openpyxl import load_workbook

# Load the workbook
workbook_path = '/path/to/your/excel/file.xlsx'
workbook = load_workbook(workbook_path)

# Dictionary to hold dataframes for each sheet
dataframes = {}

# Loop through each sheet in the workbook
for sheet_name in workbook.sheetnames:
    worksheet = workbook[sheet_name]
    data = []
    
    # Extract data from each row in the sheet
    for row in worksheet.iter_rows(values_only=True):
        data.append(row)
    
    # Create a DataFrame and add it to the dictionary
    # Assuming the first row is the header
    df = pd.DataFrame(data)
    dataframes[sheet_name] = df

# Access a specific DataFrame
sheet_name_to_access = 'Sheet1'  # Example sheet name
print(dataframes[sheet_name_to_access])  # Print the DataFrame for 'Sheet1'

The problem is i want to to print tables separately and not with the normal data.

I am not able to modify the code such that it ignores the tables cells and print only non table cell data separately .

If i am able to do so i can extract table data using the first code.

reading tables and usual cell data in excel separately using openpyxl

Answers (1)

Related Questions