Steve DEU
Steve DEU

Reputation: 167

Python Pandas: Highlight matching text and row

I’m trying to change the font color to red for any strings in df1 that match values in df3 and highlight the row. I couldn't find any information about changing font color. The data sets are:

df1 = [ ‘i like to shop at store a.’ , ‘he likes to shop at the store b.’, ‘she is happy to shop at store c.’, 'we want to shop at the store d.']
df2 = [ ‘store a’, ‘store b’, ‘store c’, 'store d' ]
df3 = [ ‘like to’, ‘likes to shop’, ‘at store’ ]

I'm using the following:

myDataSet = list(zip(df1,df2))
df = pd.DataFrame(data = myDataSet, columns=[‘df1’, ‘df2’]

The output should look like:

enter image description here

Please help!

Upvotes: 7

Views: 8312

Answers (3)

Matt07
Matt07

Reputation: 523

It is accomplishable inside a Jupyter Notebook using HTML formatting, as @Ywapom suggested. Please check his answer as well.

import re
from IPython.display import HTML

def display_highlighted_words(df, keywords):
    head = """
    <table>
        <thead>
            """ + \
            "".join(["<th> %s </th>" % c for c in df.columns])\
            + """
        </thead>
    <tbody>"""

    for i,r in df.iterrows():
        row = "<tr>"
        for c in df.columns:
            matches = []
            for k in keywords:
                for match in re.finditer(k, str(r[c])):
                    matches.append(match)
        
            # reverse sorting
            matches = sorted(matches, key = lambda x: x.start(), reverse=True)
        
            # building HTML row
            cell = str(r[c])
            for match in matches:
                cell = cell[:match.start()] +\
                    "<span style='color:red;'> %s </span>" % cell[match.start():match.end()] +\
                    cell[match.end():]
            row += "<td> %s </td>" % cell
                
            row += "</tr>"
        head += row

    head += "</tbody></table>"
    display(HTML(head))

Then, with tan example DataFrame like this one

df = pd.DataFrame([["Franco color Franco",1], 
                   ["Franco Franco Ciccio Franco",2], 
                   ["Ciccio span",3]], columns=["A", "B"])
display_highlighted_words(df, ["Franco", "Ciccio"])

the result is the following.

Sample result

The above code could be easily extended to have the keywords vector to be selected from a column of the dataset, as the original question was asking.

Upvotes: 5

Turo
Turo

Reputation: 1607

You can format conditionally rows or cells containing this text, e.g. as per below example. I don't think you will be able to highlight in red just the part of the text (well, unless you want to go into significant hacks of reparsing entire html, which I'm not even sure would be possible). Refer to the Styler documentation.

import pandas as pd

df1 = [ 'i like to shop at store a.' , 'he likes to shop at the store b.', 'she is happy to shop at store c.', 'we want to shop at the store d.']
df2 = [ 'store a', 'store b', 'store c', 'store d' ]
df3 = [ 'like to', 'likes to shop', 'at store' ]

myDataSet = list(zip(df1,df2))
df = pd.DataFrame(data = myDataSet, columns=['df1', 'df2'])

def in_statements(val):
    for statement in df3:
        if statement in val:
            color = 'yellow'
            break
        else:
            color = 'black'
    return 'background-color: %s' % color

df = df.style.applymap(in_statements)

df

Why deal with the styling fuzz anyways? :) not better just to add an extra column which extracts the text which is interesting for you? (or blank if it's not there)

Edit: Per request, a way to achieve the objective without styling limitations, by adding an extra column:

def check(df):
    df["Statements"] = ", ".join(
        [x for x in df3 if x in df["df1"].to_string()])
    return df

df = df.groupby("df1").apply(lambda dfx: check(dfx))
df

Upvotes: 4

Ywapom
Ywapom

Reputation: 603

You can try css to style the dataframe.

Below is some text from https://pandas.pydata.org/pandas-docs/stable/style.html

You can apply conditional formatting, the visual styling of a DataFrame depending on the data within, by using the DataFrame.style property. This is a property that returns a Styler object, which has useful methods for formatting and displaying DataFrames.

The styling is accomplished using CSS. You write “style functions” that take scalars, DataFrames or Series, and return like-indexed DataFrames or Series with CSS "attribute: value" pairs for the values. These functions can be incrementally passed to the Styler which collects the styles before rendering.

Upvotes: 2

Related Questions