LucasS
LucasS

Reputation: 729

Extracting Hyperlinks From Excel (.xlsx) with Python

I have been looking at mostly the xlrd and openpyxl libraries for Excel file manipulation. However, xlrd currently does not support formatting_info=True for .xlsx files, so I can not use the xlrd hyperlink_map function. So I turned to openpyxl, but have also had no luck extracting a hyperlink from an excel file with it. Test code below (the test file contains a simple hyperlink to google with hyperlink text set to "test"):

import openpyxl

wb = openpyxl.load_workbook('testFile.xlsx')

ws = wb.get_sheet_by_name('Sheet1')

r = 0
c = 0

print ws.cell(row = r, column = c). value
print ws.cell(row = r, column = c). hyperlink
print ws.cell(row = r, column = c). hyperlink_rel_id

Output:

test

None

I guess openpyxl does not currently support formatting completely either? Is there some other library I can use to extract hyperlink information from Excel (.xlsx) files?

Upvotes: 20

Views: 37588

Answers (8)

Abiya Wilson
Abiya Wilson

Reputation: 11

import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsx')
ws = wb['Sheet1']

try:
    print(ws.cell(row=2, column=1).hyperlink.target)

#This fail if their is no hyperlink
except:
    print(ws.cell(row=2, column=1).value)

In order to handle the exception 'message': "'NoneType' object has no attribute 'target'", we can use it in a try/except block. So even if there are no hyperlinks available in the given cell, it will print the content contained in the cell.

Upvotes: 1

wordsforthewise
wordsforthewise

Reputation: 15777

This is possible with openpyxl:

import openpyxl

wb = openpyxl.load_workbook('yourfile.xlsm')
ws = wb['Sheet1']
# This will fail if there is no hyperlink to target
print(ws.cell(row=2, column=1).hyperlink.target)

Upvotes: 32

Ulas Bardak
Ulas Bardak

Reputation: 1

If instead of just .hyperlink, doing .hyperlink.target should work. I was getting a 'None' as well from using just ".hyperlink" on the cell object before that.

Upvotes: 0

Hellohowdododo
Hellohowdododo

Reputation: 406

Starting from at least version openpyxl-2.4.0b1 this bug https://bitbucket.org/openpyxl/openpyxl/issue/152/hyperlink-returns-empty-string-instead-of was fixed. Now it's return for cell Hyperlink object:

hl_obj = ws.row(col).hyperlink  # getting Hyperlink object for Cell
#hl_obj = ws.cell(row = r, column = c).hyperlink This could be used as well.
if hl_obj:
    print(hl_obj.display)
    print(hl_obj.target)
    print(hl_obj.tooltip) # you can see it when hovering mouse on hyperlink in Excel
    print(hl_obj) # to see other stuff if you need

Upvotes: 4

Charlie Clark
Charlie Clark

Reputation: 19497

For direct manipulation of Excel files it's also worth looking at the excellent XlWings library.

Upvotes: 1

alecxe
alecxe

Reputation: 473803

FYI, the problem with openpyxl is an actual bug.

And, yes, xlrd cannot read the hyperlink without formatting_info, which is currently not supported for xlsx.

Upvotes: 3

John Paul Hayes
John Paul Hayes

Reputation: 798

A successful solution I've worked with is to install unoconv on the server and implement a method that invokes this command line tool via the subprocess module to convert the file from xlsx to xls since hyperlink_map.get() works with xls.

Upvotes: 1

battmatt
battmatt

Reputation: 34

In my experience getting good .xlsx interaction requires moving to IronPython. This lets you work with the Common Language Runtime (clr) and interact directly with excel'

http://ironpython.net/

import clr
clr.AddReference("Microsoft.Office.Interop.Excel")
import Microsoft.Office.Interop.Excel as Excel
excel = Excel.ApplicationClass()

wb = excel.Workbooks.Open('testFile.xlsx')
ws = wb.Worksheets['Sheet1']

address = ws.Cells(row, col).Hyperlinks.Item(1).Address

Upvotes: 1

Related Questions