Reputation: 41
I'm trying to extract from an Excel file the content of a cell (essentially text) and the format of the text. The Excel I am dealing with looks like the screenshot below :
Texts in cells may be bold, italic or stroke and I need to extract the text and the format into a python string. For example, if a cell contains :
- line 1
- line 2
- line 3
I would like to obtain a string Python that's look like :
- line 1 /n- **line 2**/n- *line 3**
in order to get the text and the format information.
I've tried to find a solution using openpyxl, but it seems that it is only possible to apply and not extract format cell. Library xlrd seems not fitted for xlsx. I am currently trying with pyexcel library.
Do you have any idea ? Thanks.
Upvotes: 4
Views: 7171
Reputation: 11
Using openpyxl you can extract both the style of the entire cell (as shown in a previous answer) and parts of the text in each cell if you set the rich_text
flag to True
when loading the workbook.
Here is an example in python 3.10 that prints out all italicized text:
from openpyxl import load_workbook
from openpyxl.cell.rich_text import CellRichText, TextBlock
# The 'rich_text=True' parameter is required otherwise the cells are
workbook = load_workbook('trash/test.xlsx', rich_text=True)
# Assume you're working with the first sheet
sheet = workbook.active
for row in sheet.iter_rows():
for cell in row:
# Check if the entire cell is italicized
if cell.font.italic:
print(f"Cell {cell.coordinate} is completely italicized: {cell.value}")
# cell.value will either be CellRichText or str, with CellRichText having more formatting that needs to be checked.
if isinstance(cell.value, CellRichText):
for text_block in cell.value:
# Ensure it's a text block not a plain string, and that it is in fact italicized
if isinstance(text_block, TextBlock) and text_block.font.italic:
print(f"Cell {cell.coordinate} contains italicized text: {text_block.text}")
workbook.close()
This example will print out all the italicized text and what cell that text is in. For completely italicized cells it will print out a single entry for that cell, for cells that are interspersed with formatting it will print out multiple entries for every block of text that is italicized. This means that if a cell is completely italicized and has some other text formatting in it it will print out multiple entries with some caveats.
If you have "this is an example" in cell A1 you get this output from the script:
Cell A1 is completely italicized: this is an example
Cell A1 contains italicized text: an
Cell A1 contains italicized text: example
Upvotes: 1
Reputation: 41
You can import Font from openpyxl and check if a cell is written in bold by using cell.font.bold
, it gives a True or False.
cell=sheet[A2]
bold_status=cell.font.bold
italic_status=cell.font.italic
More info on the fonts in openpyxl: http://openpyxl.readthedocs.io/en/2.5/api/openpyxl.styles.fonts.html
Upvotes: 4