Eduardo
Eduardo

Reputation: 121

Pandas Styling with Conditional Rules

I was trying to style a Pandas Dataframe using 2 different columns. I was successful as long as the conditional was about the column itself, however when it depends on another one, I was unable to get the desired results.

I want to color the cell in "Date II" if "Date I" is in the past.

def date_pii(row):
        ret = ["" for _ in row.index]
        print(row['Date PI'])
        if row['Date PI'] < datetime.now():
            ret[row.index.get_loc("Date PII")] = "background-color: red"
            return ret

styler = df3.style \
        .applymap(lambda x: 'background-color: %s' % 'red' if x <= datetime.now() else '', subset=['Date PI']) \
        .applymap(lambda x: 'background-color: %s' % 'yellow' if x < datetime.now() + timedelta(days=30) else '',
                  subset=['Date PII']) \
        .applymap(lambda x: 'background-color: %s' % 'orange' if x <= datetime.now() else '', subset=['Date PII']) \
        .applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PI'])\
        .applymap(lambda x: 'background-color: %s' % 'grey' if pd.isnull(x) else '', subset=['Date PII'])\
        .apply(date_pii, axis=1) ----> THIS IS THE ISSUE

styler.to_excel(writer, sheet_name='Report Paris', index=False)

At runtime I get the following error:

ValueError: Function <function generate_report_all.<locals>.date_pii at 0x7fd3964d9160> returned the wrong shape.
Result has shape: (532,)
Expected shape:   (532, 10)

The dataframe looks like this: df3

The first orange cell in "Date PII" is correct, however, the remaining ones (where PI is red) I would like them to become red as well.

Thanks for the help!

Upvotes: 1

Views: 2367

Answers (2)

Attack68
Attack68

Reputation: 4785

While @HenryEcker solution works at the DataFrame level (notice his use of the axis=None keyword argument) there are times when a simpler approach might be warranted.

Since your conditions are entirely row dependent you can use apply with axis=1 and attach a function that calculates based on the columns values in each row.

For example:

df = DataFrame([[1,2,3],[3,2,1]], index=["i", "j"], columns=["A", "B", "C"])

   A   B   C
i  1   2   3
j  3   2   1

Suppose we want to highlight column C if it is less than column A:

def highlight(s):
    if s["C"] < s["A"]
        return ["", "color: red;"]
    return ["", ""]

df.style.apply(highlight, subset=["A", "C"], axis=1)

enter image description here

Upvotes: 1

Henry Ecker
Henry Ecker

Reputation: 35686

The general approach to this type of problem is to pass the specified columns as a subset to Styler.apply. This allows us to create styles at the DataFrame level and use loc indexing to build out the styles based on conditions. The other major benefit is that rather than chaining we can use the extra space to provide documentation and also reduce the overhead of all of those lambda:

def style_dates(subset_df):
    # Empty Styles
    style_df = pd.DataFrame(
        '', index=subset_df.index, columns=subset_df.columns
    )
    # Today's Date
    today = pd.Timestamp.now().normalize()
    # Date PII is within 30 days from today
    style_df.loc[
        subset_df['Date PII'].le(today + pd.Timedelta(days=30)),
        'Date PII'
    ] = 'background-color: yellow'

    # Date PI is before today
    style_df.loc[
        subset_df['Date PI'].lt(today),
        ['Date PI', 'Date PII']
    ] = 'background-color: red'

    # Date PII is before today and Date PI is after Today
    style_df.loc[
        subset_df['Date PII'].lt(today) & subset_df['Date PI'].gt(today),
        'Date PII'
    ] = 'background-color: orange'

    # Either is NaN
    style_df[subset_df.isna()] = 'background-color: gray'
    return style_df


styler = df3.style.apply(
    style_dates, axis=None, subset=['Date PII', 'Date PI']
).format(
    # Optional Explicit Date Format
    formatter='{:%Y-%m-%d}', na_rep='NaT', subset=['Date PII', 'Date PI']
)

styled DataFrame


Setup DataFrame randomly generated always relative to the current date (styles will be consistent while dates will not be):

import numpy as np
import pandas as pd
from numpy.random import Generator, MT19937

norm_today = pd.Timestamp.now().normalize()
rng = Generator(MT19937(1023))


def md(lower_bound, upper_bound, rng_=rng):
    return pd.Timedelta(days=rng_.integers(lower_bound, upper_bound))


df3 = pd.DataFrame({
    'Desc': [
        'PII within 30 days',  # PII yellow
        'PII in past and PI in future',  # PII orange
        'PI past',  # Both red
        'PI empty',  # grey
        'PII empty',  # grey
        'PII in future but not within 30 days'  # No Styles
    ],
    'Date PII': [norm_today + md(1, 10), norm_today - md(1, 10),
                 norm_today, norm_today, np.nan,
                 norm_today + md(40, 50)],
    'Date PI': [norm_today, norm_today + md(1, 10),
                norm_today - md(1, 10), np.nan, norm_today,
                norm_today]
})
Desc Date PII Date PI
PII within 30 days 2021-11-06 00:00:00 2021-11-03 00:00:00
PII in past and PI in future 2021-10-31 00:00:00 2021-11-11 00:00:00
PI past 2021-11-03 00:00:00 2021-11-01 00:00:00
PI empty 2021-11-03 00:00:00 NaT
PII empty NaT 2021-11-03 00:00:00
PII in future but not within 30 days 2021-12-19 00:00:00 2021-11-03 00:00:00

Upvotes: 2

Related Questions