codette
codette

Reputation: 103

how to extract different tables in excel sheet using python

In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.

enter image description here

Upvotes: 4

Views: 13101

Answers (3)

Wizhi
Wizhi

Reputation: 6549

I assume your tables are formatted as "Excel Tables". You can create an excel table by mark a range and then click:

enter image description here

Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.


I have used the following data in excel, where each color represents a table.

enter image description here


Remarks about code:

The following part can be used to check which tables exist in the worksheet that we are working with:

# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})

In our example this code will give:

{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}

Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.

# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()

# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df

print(Table2.head(3)) gives:

   Index   first_name   last_name     address
   0          Aleshia   Tomkiewicz    14 Taylor St
   1             Evan   Zigomalas     5 Binney St
   2           France   Andrade       8 Moor Place

Full code:

#import libraries
from openpyxl import load_workbook
import pandas as pd

# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename

# select the sheet where tables are located
ws = wb["Tables"]

# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})

mapping = {}

# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
    # parse the data within the ref boundary
    data = ws[data_boundary]
    
    ### extract the data ###
    # the inner list comprehension gets the values for each cell in the table
    content = [[cell.value for cell in ent] 
               for ent in data]
    
    header = content[0]
    
    #the contents ... excluding the header
    rest = content[1:]
    
    #create dataframe with the column names
    #and pair table name with dataframe
    df = pd.DataFrame(rest, columns = header)
    mapping[entry] = df

#  print(mapping)

# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()

# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)

Example data, example file:

first_name last_name address city county postal
Aleshia Tomkiewicz 14 Taylor St St. Stephens Ward Kent CT2 7PP
Evan Zigomalas 5 Binney St Abbey Ward Buckinghamshire HP11 2AX
France Andrade 8 Moor Place East Southbourne and Tuckton W Bournemouth BH6 3BE
Ulysses Mcwalters 505 Exeter Rd Hawerby cum Beesby Lincolnshire DN36 5RP
Tyisha Veness 5396 Forth Street Greets Green and Lyng Ward West Midlands B70 9DT
Eric Rampy 9472 Lind St Desborough Northamptonshire NN14 2GH
Marg Grasmick 7457 Cowl St #70 Bargate Ward Southampton SO14 3TY
Laquita Hisaw 20 Gloucester Pl #96 Chirton Ward Tyne & Wear NE29 7AD
Lura Manzella 929 Augustine St Staple Hill Ward South Gloucestershire BS16 4LL
Yuette Klapec 45 Bradfield St #166 Parwich Derbyshire DE6 1QN
Fernanda Writer 620 Northampton St Wilmington Kent DA2 7PP
Charlesetta Erm 5 Hygeia St Loundsley Green Ward Derbyshire S40 4LY
Corrinne Jaret 2150 Morley St Dee Ward Dumfries and Galloway DG8 7DE
Niesha Bruch 24 Bolton St Broxburn, Uphall and Winchburg West Lothian EH52 5TL
Rueben Gastellum 4 Forrest St Weston-Super-Mare North Somerset BS23 3HG
Michell Throssell 89 Noon St Carbrooke Norfolk IP25 6JQ
Edgar Kanne 99 Guthrie St New Milton Hampshire BH25 5DF

Upvotes: 6

Ama
Ama

Reputation: 1575

You need two things:

  1. Access OpenXML data via python: https://github.com/python-openxml/python-xlsx
  2. Find the tables in the file, via what is called a DefinedName: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1

Upvotes: 0

Mohammad Farseen
Mohammad Farseen

Reputation: 103

You may convert your excel sheet to csv file and then use csv module to grab rows.

    import pandas as pd
    read_file  = pd.read_excel("Test.xlsx")
    read_file.to_csv ("Test.csv",index = None,header=True) 
    enter code here
    df = pd.DataFrame(pd.read_csv("Test.csv")) 
    print(df)

For better approch please provide us sample excel file

Upvotes: -2

Related Questions