Python_Learner
Python_Learner

Reputation: 1637

Pandas Check if Column Value in Range Between Other Column Values

I'm a newbie and I am sure there's an easy way to do this that I don't know about, thank you in advance for your help.

I've got the historical min and max sales numbers for the highest and lowest sales team performer on each day of the year for the last 10 years. I've also got the same thing for 2016 (example data below).

              hist_min  hist_max   2016_min  2016_max
Day_of_Year
1               1000    10000         898     9000
2                234      896         300     1000
3               1254    23666        1000    24000
4                930    78999        1000     1050
5                278    74588         300     5000

The end goal is to chart this data in matplotlib but I only want to get the vales from the 2016_min column that is below the value in the hist_min, and similarly only have values for the 2016_max columns that are more than the hist_max column. The data would look like this:

              hist_min  hist_max   2016_min  2016_max
Day_of_Year
1               1000    10000         898     NULL
2                234      896        NULL     1000
3               1254    23666        1000    24000
4                930    78999        NULL     NULL
5                278    74588        NULL     NULL

I put 'NULL' in there to represent the empty values, Nan might be better but I don't know if matplot lib can handle 'Nan' numbers or not... that's the next step so I'll find out soon enough.

Thank you in advance for your help, Me

Upvotes: 2

Views: 5575

Answers (2)

ImportanceOfBeingErnest
ImportanceOfBeingErnest

Reputation: 339230

You can index a dataframe by a condition,

 df1 = df[df["2016_max"] > df["hist_max"]]

This can then easily plotted with matplotlib.

u = u"""Day_of_Year      hist_min  hist_max   2016_min  2016_max
1               1000    10000         898     9000
2                234      896         300     1000
3               1254    23666        1000    24000
4                930    78999        1000     1050
5                278    74588         300     5000"""

import io
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv(io.StringIO(u), index_col=0, delim_whitespace=True)

df1 = df[df["2016_max"] > df["hist_max"]]
df2 = df[df["2016_min"] < df["hist_min"]]

fig, ax = plt.subplots()
ax.scatter(df1.index, df1["2016_max"], label="max. 2016")
ax.scatter(df2.index, df1["2016_min"], label="min. 2016")

plt.legend()
plt.show()

enter image description here

Upvotes: 1

jezrael
jezrael

Reputation: 862641

Use mask with boolean masks which return NaN,NULL or None if Trues:

print (df['2016_min'] > df['hist_min'])
Day_of_Year
1    False
2     True
3    False
4     True
5     True
dtype: bool

df['2016_min'] = df['2016_min'].mask(df['2016_min'] > df['hist_min'])
df['2016_max'] = df['2016_max'].mask(df['2016_max'] < df['hist_max'])
print (df)
             hist_min  hist_max  2016_min  2016_max
Day_of_Year                                        
1                1000     10000     898.0       NaN
2                 234       896       NaN    1000.0
3                1254     23666    1000.0   24000.0
4                 930     78999       NaN       NaN
5                 278     74588       NaN       NaN

df['2016_min'] = df['2016_min'].mask(df['2016_min'] > df['hist_min'], 'NULL')
df['2016_max'] = df['2016_max'].mask(df['2016_max'] < df['hist_max'], 'NULL')
print (df)
             hist_min  hist_max 2016_min 2016_max
Day_of_Year                                      
1                1000     10000      898     NULL
2                 234       896     NULL     1000
3                1254     23666     1000    24000
4                 930     78999     NULL     NULL
5                 278     74588     NULL     NULL

df['2016_min'] = df['2016_min'].mask(df['2016_min'] > df['hist_min'], None)
df['2016_max'] = df['2016_max'].mask(df['2016_max'] < df['hist_max'], None)
print (df)
             hist_min  hist_max 2016_min 2016_max
Day_of_Year                                      
1                1000     10000      898     None
2                 234       896     None     1000
3                1254     23666     1000    24000
4                 930     78999     None     None
5                 278     74588     None     None

Upvotes: 1

Related Questions