Reputation: 729
I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True
for .xlsx files, so I can not use the xlrd hyperlink_map
function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):
import openpyxl
wb = openpyxl.load_workbook('testFile.xlsx')
ws = wb.get_sheet_by_name('Sheet1')
r = 0
c = 0
print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id
Output:
test
None
I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?
Upvotes: 20
Views: 37588
Reputation: 11
import openpyxl
wb = openpyxl.load_workbook('yourfile.xlsx')
ws = wb['Sheet1']
try:
print(ws.cell(row=2, column=1).hyperlink.target)
#This fail if their is no hyperlink
except:
print(ws.cell(row=2, column=1).value)
In order to handle the exception 'message': "'NoneType' object has no attribute 'target'"
, we can use it in a try/except block. So even if there are no hyperlinks available in the given cell, it will print the content contained in the cell.
Upvotes: 1
Reputation: 15777
This is possible with openpyxl:
import openpyxl
wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
# This will fail if there is no hyperlink to target
print(ws.cell(row=2, column=1).hyperlink.target)
Upvotes: 32
Reputation: 1
If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.
Upvotes: 0
Reputation: 406
Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:
hl_obj = ws.row(col).hyperlink # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
print(hl_obj.display)
print(hl_obj.target)
print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
print(hl_obj) # to see other stuff if you need
Upvotes: 4
Reputation: 19497
For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.
Upvotes: 1
Reputation: 473803
FYI, the problem with openpyxl
is an actual bug.
And, yes, xlrd
cannot read the hyperlink without formatting_info
, which is currently not supported for xlsx
.
Upvotes: 3
Reputation: 798
A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.
Upvotes: 1
Reputation: 34
In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'
import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()
wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']
address = ws.Cells(row, col).Hyperlinks.Item(1).Address
Upvotes: 1