user13474103
user13474103

Reputation:

Highlight python pandas dataframe if certain values are found

It is from last example in Chapter 7 of 'Pandas Cookbook', using the flight.csv dataset. The objective is to find the longest delay streak for each airline and origin airport combinations. I made slight modification from my own.

def max_delay_streak(df):
  df = df.reset_index(drop=True)
  s = 1- df['ON_TIME']
  s1 = s.cumsum()
  streak = s.mul(s1).diff().where(lambda x: x < 0).ffill().add(s1, fill_value =0)
  df['streak'] = streak
  last_idx = streak.idxmax()
  max_streak = streak.max()

  # my slight modification here to accommodate delay streak equals 0
  if max_streak == 0:
      first_idx = 0
  else:
     first_idx = last_idx - max_streak + 1

  df_return = df.loc[[first_idx, last_idx],['MONTH','DAY']]
  df_return['streak'] = max_streak  
  df_return.index = ['first','last']
  df_return.index.name = 'streak_row'

  # search and operate zero streak

  # my adjustment to find index where there is no delay streak
  # df_return[df_return['streak'] == 0].index
  # gets the MultiIndex([('EV', 'PHX', 'first'), ('EV', 'PHX',  'last')],
  #      names=['AIRLINE', 'ORG_AIR', 'streak_row'])
  no_streak = df_return[df_return['streak'] == 0].index
  # get the data from respective index and return month/day into '-'
  df_return.loc[no_streak,['MONTH','DAY']] = '-'
  return df_return 

  flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)

The code runs OK here. Next I try to highlight the rows in yellow where delay streak is 0 (or any other number).

desired_result

I tried 2 methods, which the program runs without error, and produce the original dataframe without highlight anything.

Method 1: reuse the .loc logic in the last row of the above program, to use the index to get into specific row to add color.

df_return.loc[no_streak].style.apply('background-color: yellow',axis=1)

Method 2: an ugly way. I tried to extract all (airline, origin airport, first/last) index, check them against index of zero delay streak, where the information is stored in variable 'no_streak' (in this case ('EV', 'PHX', 'first'), ('EV', 'PHX', 'last')). If the condition is satisified, then apply the color.

df_return.style.apply(['background-color: yellow' for x in list(df_return.index) if x in list(no_streak)], axis=1)

Why my code failed to get the desired picture? Is it possible to achieve the goal?

Upvotes: 0

Views: 674

Answers (2)

user13474103
user13474103

Reputation:

While the code is not elegant, but I finally got what I desired. Enter a number and the dataframe will highlight all the corresponding rows that fits the search criteria, the result is like this pic.enter image description here

df = flights.sort_values(['MONTH','DAY','SCHED_DEP']).groupby(['AIRLINE','ORG_AIR']).apply(max_delay_streak)
streak_no = input("Enter streak no: ")
streak_no = int(streak_no)
color_dict = {"AA": "lightcoral", "AS": "orangered", "B6": "orange", "DL": "yellow" , "EV": "lawngreen", "F9": "palegreen", "HA": "lightcyan", "MQ": "aqua",
  "NK": "skyblue", "OO": "lightsteelblue", "UA": 'lavender', "US": "violet", "VX": "magenta", "WN": "pink"}

# first level coloring
# get the first level index value
first_level_index = df.index.get_level_values(0)
# get unique first level value where criteria is met
no_streak_row_unique = df[df['streak'] == streak_no].index.get_level_values(0).unique()
# decide which row in first level to color
first_level_color_arrangement = [ {'selector': f'.row{i}.level0', 
          'props': [('background-color', color_dict[j])]} if j in no_streak_row_unique else {'selector': f'.row{i}.level0', 
          'props': [('background-color', 'default')]}for i,j in enumerate(first_level_index)]

# second level unique
second_level_index = list(zip(df.index.get_level_values(0) , df.index.get_level_values(1)))
# no_streak_row_2_unique
no_streak_row_2_unique = list(set(zip(df[df['streak'] == streak_no].index.get_level_values(0), df[df['streak'] == streak_no].index.get_level_values(1))))
second_level_color_arrangement = [ {'selector': f'.row{i}', 
          'props': [('background-color', color_dict[j[0]])]} if j in no_streak_row_2_unique else {'selector': f'.row{i}', 
          'props': [('background-color', 'default')]}for i,j in enumerate(second_level_index)]

df.style.set_table_styles(first_level_color_arrangement + second_level_color_arrangement)

Upvotes: 0

Pascalco
Pascalco

Reputation: 2826

Perform the styling outside of the max_delay_streak() function.

import pandas as pd
flights = pd.read_csv('flights.csv')
flights['ON_TIME'] = flights['ARR_DELAY'].lt(15).astype(int)
flights_agg = flights.sort_values(['MONTH', 'DAY', 'SCHED_DEP']).groupby(['AIRLINE', 'ORG_AIR']).apply(max_delay_streak)
flights_agg.style.apply(lambda x: ['background-color: yellow']*3 if x.streak == 0 else ['background-color: default']*3, axis=1)

where max_delay_streak() is the function defined in the question.

Upvotes: 1

Related Questions