Reputation: 793

How to read multiple tables from each tab of an excel sheet in Python?

So I've an excel sheet that has multiple tabs and each individual tab has multiple tables in it. So i want to read the file in such a way that it reads each table from each tab of the sheet, for instance,

Tab1 has five tables in it.
Tab2 has Ten tables in it.
.....
.....

I want to read each one of these table in pandas dataframe and then save it to sql database. I know how how to read multiple tabs from the excel sheet.

Can anyone help me out here or point me to a direction where i can find a lead?

The tables in the tab are pre-defined and have name. Thats how it looks like in each tab Tab from excel sheet

Upvotes: 1

Answers (1)

sammywemmy

Reputation: 28729

You probably have to tweak it to match your data; imagine if you have some tables below and some above. This, hopefully, should point you in the right direction. Also, note the number of for loops I used; I believe you can do better and optimize it further.

from openpyxl import load_workbook
from collections import defaultdict
from itertools import product, groupby
from operator import itemgetter

wb = load_workbook(filename="test.xlsx")

sheet = wb["Sheet1"]

green_rows = defaultdict(list)
rest_data = []

for row in sheet:
    for cell in row:
        look for the green rows; they contain the headers
        if cell.fill.fgColor.rgb == "FFA2D722":
            # take advantage of the fact that header 
            # is the first entry in that row
            if cell.value:
                val = cell.value
            green_rows[(val, cell.row)].append(cell.column)
        else:
            if cell.value not in (None, ""): # so the 0s are not lost
                rest_data.append((cell.row, cell.column, cell.value))

# get the max and minimum column positions
# note the addition of 1 to the max, 
# this is necessary when iterating to sort the data
# in the next section
green_rows = [
    (name, row, range(min(value), max(value) + 1))
    for (name, row), value in green_rows.items()
]


box = []

# here the green rows and the rest of the data
# are combined, then filtered for the respective 
# sections
combo = product(green_rows, rest_data)
for (header, header_row, header_column_range), (
    cell_row,
    cell_column,
    cell_value,
) in combo:
    # this is where the filtration occurs
    if (header_row < cell_row) and (cell_column in header_column_range):
        box.append((header, cell_row, cell_column, cell_value))

final = defaultdict(list)
content = groupby(box, itemgetter(1, 0))

# another iteration to get the final result
for key, value in content:
    final[key[-1]].append([val[-1] for val in value])

You can create your dataframe for each of the headers:

pd.DataFrame(final["Address Association"])


0   1   2   3   4   5
0   Column Name in DB   Name    Description SortOrder   BusinessMeaningName Obsolete
1   Field Type  nvarchar(100)   nvarchar(255)   int nvarchar(50)    bit
2   Mandatory   Yes Yes Yes No  Yes
3   Foreign Key -   -   -   -   -
4   Optional Feature    -   -   -   -   -
5   Field Name in U4SM  Name    Description Sort Order  Business Meaning Name   Obsolete
6   Address.Primary Primary Use this address by default.    1   Address.Primary 0
7   Address.Billing Billing address for billing.    2   Address.Billing 0
8   Address.Emergency   Emergency   use this for emergency. 3   Address.Emergency   0
9   Address.Emergency SMS   Emergency SMS   use this for emergency SMS. 4   Address.Emergency SMS   0
10  Address.Deceased    Deceased    address for deceased.   5   Address.Deceased    0
11  Address.Home    Home    address for home.   8   Address.Home    0
12  Address.Mailing Mailing address for mailing.    9   Address.Mailing 0
13  Address.Mobile  Mobile  use this for mobile.    10  Address.Mobile  0
14  Address.School  School  address for school. 13  Address.School  0
15  Address.SMS SMS use this for SMS text.  15  Address.SMS 0
16  Address.Work    Work    address for work    16  Address.Work    0
17  Address.Permanent   Permanent   Permanent Address   17  Address.Permanent   0
18  Address.HallsOfResidence    Halls of Residence  Halls of Residence  18  Address.HallsOfResidence    0

Upvotes: 1

How to read multiple tables from each tab of an excel sheet in Python?

Answers (1)

Related Questions