Grand Paradis
Grand Paradis

Reputation: 41

How to extract cell format (bold, italic, ...) from an Excel file using Python?

I'm trying to extract from an Excel file the content of a cell (essentially text) and the format of the text. The Excel I am dealing with looks like the screenshot below :

screenshot of excel

Texts in cells may be bold, italic or stroke and I need to extract the text and the format into a python string. For example, if a cell contains :

  • line 1
  • line 2
  • line 3

I would like to obtain a string Python that's look like :

- line 1 /n- **line 2**/n- *line 3**

in order to get the text and the format information.

I've tried to find a solution using openpyxl, but it seems that it is only possible to apply and not extract format cell. Library xlrd seems not fitted for xlsx. I am currently trying with pyexcel library.

Do you have any idea ? Thanks.

Upvotes: 4

Views: 7171

Answers (2)

avvinogradov
avvinogradov

Reputation: 11

Using openpyxl you can extract both the style of the entire cell (as shown in a previous answer) and parts of the text in each cell if you set the rich_text flag to True when loading the workbook.

Here is an example in python 3.10 that prints out all italicized text:

from openpyxl import load_workbook
from openpyxl.cell.rich_text import CellRichText, TextBlock


# The 'rich_text=True' parameter is required otherwise the cells are 
workbook = load_workbook('trash/test.xlsx', rich_text=True)

# Assume you're working with the first sheet
sheet = workbook.active

for row in sheet.iter_rows():
    for cell in row:
        # Check if the entire cell is italicized
        if cell.font.italic:
        print(f"Cell {cell.coordinate} is completely italicized: {cell.value}")

        # cell.value will either be CellRichText or str, with CellRichText having more formatting that needs to be checked.
        if isinstance(cell.value, CellRichText):
            for text_block in cell.value:
                # Ensure it's a text block not a plain string, and that it is in fact italicized
                if isinstance(text_block, TextBlock) and text_block.font.italic:
                    print(f"Cell {cell.coordinate} contains italicized text: {text_block.text}")

workbook.close()

This example will print out all the italicized text and what cell that text is in. For completely italicized cells it will print out a single entry for that cell, for cells that are interspersed with formatting it will print out multiple entries for every block of text that is italicized. This means that if a cell is completely italicized and has some other text formatting in it it will print out multiple entries with some caveats.

If you have "this is an example" in cell A1 you get this output from the script:

Cell A1 is completely italicized: this is an example
Cell A1 contains italicized text: an
Cell A1 contains italicized text:  example
  • 1 for the entire cell
  • the "this is " seems to be skipped
  • 1 for the bolded italicized "an"
  • 1 for the italicized " example"

Upvotes: 1

Fadi
Fadi

Reputation: 41

You can import Font from openpyxl and check if a cell is written in bold by using cell.font.bold, it gives a True or False.

cell=sheet[A2]
bold_status=cell.font.bold
italic_status=cell.font.italic

More info on the fonts in openpyxl: http://openpyxl.readthedocs.io/en/2.5/api/openpyxl.styles.fonts.html

Upvotes: 4

Related Questions