Akash s
Akash s

Reputation: 127

Pandas Conditional formatting by comparing the column values of dataframe

import io
import pandas as pd

csv_data = '''App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''

df = pd.read_csv(io.StringIO(csv_data), sep=",")
html_table = df.tohtml()

Is there a way to compare the values of columns in dataframe and use it in conditional formatting ? I want compare if the 'prod','pre-prod' and 'stage' values are mismatching, if yes then then its bg-color should be red. I have tired the following methods present in pandas but none of them works.

df.style.apply()

df.style.apply_index()

df.style.applymap()

Current Output:

enter image description here

Desired output:

enter image description here

Upvotes: 1

Views: 1069

Answers (3)

Henry Ecker
Henry Ecker

Reputation: 35626

It's also possible to style the entire DataFrame in one go by passing axis=None to apply.

We can identify rows which have differing values in the specified columns by comparing the first column (column 0) with the remaining columns (column 1-2) and identifying where there are unequal values using ne on axis=0.

df[['prod', 'stage']].ne(df['pre-prod'], axis=0)

#     prod  stage
# 0  False  False
# 1  False   True

Then we can check across rows for any rows which have any True values (meaning there is something that's not equal in the row).

df[['prod', 'stage']].ne(df['pre-prod'], axis=0).any(axis=1)

# 0    False
# 1     True
# dtype: bool

We can then simply apply the styles anywhere there's a True value in the resulting Series.


Altogether this could look something like:

def colour_rows_that_dont_match(df_: pd.DataFrame, comparison_cols: List[str]):
    # Sanity check that comparison_cols is what we expect
    assert isinstance(comparison_cols, list) and len(comparison_cols) > 1, \
        'Must be a list and provide at least 2 column to compare'

    # Create an Empty DataFrame to hold styles of the same shape as the original df
    styles_df = pd.DataFrame('', index=df_.index, columns=df_.columns)
    # Compare the first column's (col 0) values to the remaining columns. 
    # Find rows where any values are not equal (ne)
    rows_that_dont_match = df[comparison_cols[1:]].ne(df[comparison_cols[0]], axis=0).any(axis=1)
    # Apply styles to rows which meet the above criteria
    styles_df.loc[rows_that_dont_match, :] = 'background-color: red'
    return styles_df


df.style.apply(
    colour_rows_that_dont_match,
    # This gets passed to the function
    comparison_cols=['pre-prod', 'prod', 'stage'],
    # Apply to the entire DataFrame at once
    axis=None
).to_html(buf='test_df.html')

Which produces the following:
HTML table with second row highlighted due to differing values in the 'prod' and 'stage' columns


Setup, version, and imports:

from typing import List

import pandas as pd  # version 1.5.2

df = pd.DataFrame({
    'App_name': ['matching-image', 'mismatching-image'],
    'pre-prod': ['nginx', 'nginx'],
    'prod': ['nginx', 'nginx'],
    'stage': ['nginx', 'nginx:1.23.3-alpine']
})

Upvotes: 1

Akash s
Akash s

Reputation: 127

import io
import pandas as pd

csv_data = '''
App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine
mismatching-image,nginx,nginx,nginx:1.23.3-alpine
'''
df = pd.read_csv(io.StringIO(csv_data), sep=",")
def match_checker(row):
    if row['prod'] == row['pre-prod'] == row['stage']:
        return [''] * len(row)
    else:
        return ['background-color: red'] * len(row)
        
df = df.style.apply(match_checker, axis=1)

html_table = df.to_html()

with open('testpandas.html','w+') as html_file:
    html_file.write(html_table)
    html_file.close()

Updated @PeterSmith answer.

Upvotes: 1

DSteman
DSteman

Reputation: 1658

You can add style conditionally by applying style to a subset of your dataframe like:

import io
import pandas as pd

csv_data = '''App_name,pre-prod,prod,stage
matching-image,nginx,nginx,nginx
mismatching-image,nginx,nginx,nginx:1.23.3-alpine'''

def add_color(row):
    return ['background-color: red'] * len(row)

df = pd.read_csv(io.StringIO(csv_data), sep=",")
df.loc[(df["pre-prod"] == df["prod"]) & (df["prod"] == df["stage"])].style.apply(add_color, axis=1)

Upvotes: 2

Related Questions