Reputation: 103
In one excel file, sheet 1 , there are 4 tables at different locations in the sheet .How to read those 4 tables . for example I have even added one picture snap from google for reference. without using indexes is there any other way to extract tables.
Upvotes: 4
Views: 13101
Reputation: 6549
I assume your tables are formatted as "Excel Tables". You can create an excel table by mark a range and then click:
Then there are a good guide from Samuel Oranyeli how to import the Excel Tables with Python. I have used his code and show with examples.
I have used the following data in excel, where each color represents a table.
Remarks about code:
The following part can be used to check which tables exist in the worksheet that we are working with:
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
In our example this code will give:
{'Table2': 'A1:C18', 'Table3': 'D1:F18', 'Table4': 'G1:I18', 'Table5': 'J1:K18'}
Here you set the dataframe names. Be cautious if the number of dataframes missmatches the number of tables you will get an error.
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2.head(3)) # Print first 3 rows from df
print(Table2.head(3))
gives:
Index first_name last_name address 0 Aleshia Tomkiewicz 14 Taylor St 1 Evan Zigomalas 5 Binney St 2 France Andrade 8 Moor Place
Full code:
#import libraries
from openpyxl import load_workbook
import pandas as pd
# read file
wb = load_workbook("G:/Till/Tables.xlsx") # Set the filepath + filename
# select the sheet where tables are located
ws = wb["Tables"]
# check what tables that exist in the worksheet
print({key : value for key, value in ws.tables.items()})
mapping = {}
# loop through all the tables and add to a dictionary
for entry, data_boundary in ws.tables.items():
# parse the data within the ref boundary
data = ws[data_boundary]
### extract the data ###
# the inner list comprehension gets the values for each cell in the table
content = [[cell.value for cell in ent]
for ent in data]
header = content[0]
#the contents ... excluding the header
rest = content[1:]
#create dataframe with the column names
#and pair table name with dataframe
df = pd.DataFrame(rest, columns = header)
mapping[entry] = df
# print(mapping)
# Extract all the tables to individually dataframes from the dictionary
Table2, Table3, Table4, Table5 = mapping.values()
# Print each dataframe
print(Table2)
print(Table3)
print(Table4)
print(Table5)
Example data, example file:
first_name | last_name | address | city | county | postal |
---|---|---|---|---|---|
Aleshia | Tomkiewicz | 14 Taylor St | St. Stephens Ward | Kent | CT2 7PP |
Evan | Zigomalas | 5 Binney St | Abbey Ward | Buckinghamshire | HP11 2AX |
France | Andrade | 8 Moor Place | East Southbourne and Tuckton W | Bournemouth | BH6 3BE |
Ulysses | Mcwalters | 505 Exeter Rd | Hawerby cum Beesby | Lincolnshire | DN36 5RP |
Tyisha | Veness | 5396 Forth Street | Greets Green and Lyng Ward | West Midlands | B70 9DT |
Eric | Rampy | 9472 Lind St | Desborough | Northamptonshire | NN14 2GH |
Marg | Grasmick | 7457 Cowl St #70 | Bargate Ward | Southampton | SO14 3TY |
Laquita | Hisaw | 20 Gloucester Pl #96 | Chirton Ward | Tyne & Wear | NE29 7AD |
Lura | Manzella | 929 Augustine St | Staple Hill Ward | South Gloucestershire | BS16 4LL |
Yuette | Klapec | 45 Bradfield St #166 | Parwich | Derbyshire | DE6 1QN |
Fernanda | Writer | 620 Northampton St | Wilmington | Kent | DA2 7PP |
Charlesetta | Erm | 5 Hygeia St | Loundsley Green Ward | Derbyshire | S40 4LY |
Corrinne | Jaret | 2150 Morley St | Dee Ward | Dumfries and Galloway | DG8 7DE |
Niesha | Bruch | 24 Bolton St | Broxburn, Uphall and Winchburg | West Lothian | EH52 5TL |
Rueben | Gastellum | 4 Forrest St | Weston-Super-Mare | North Somerset | BS23 3HG |
Michell | Throssell | 89 Noon St | Carbrooke | Norfolk | IP25 6JQ |
Edgar | Kanne | 99 Guthrie St | New Milton | Hampshire | BH25 5DF |
Upvotes: 6
Reputation: 1575
You need two things:
DefinedName
: https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.definedname?view=openxml-2.8.1Upvotes: 0
Reputation: 103
You may convert your excel sheet to csv file and then use csv module to grab rows.
import pandas as pd
read_file = pd.read_excel("Test.xlsx")
read_file.to_csv ("Test.csv",index = None,header=True)
enter code here
df = pd.DataFrame(pd.read_csv("Test.csv"))
print(df)
For better approch please provide us sample excel file
Upvotes: -2