How to apply different conditions to different months in a pandas Dataframe?

Question

so I'm trying to apply different conditions that depends on a date, months to be specific. For example, for January replace the data in TEMP that is above 45 but for February that is above 30 and so on. I already did that with the code below, but the problem is that the data from the previous month is replace it with nan.

This is my code:

meses = ["01", "02"]

for i in var_vars:
    if i in dataframes2.columns.values:
        for j in range(len(meses)):
            test_prueba_mes = dataframes2[i].loc[dataframes2['fecha'].dt.month == int(meses[j])]
            test_prueba = test_prueba_mes[dataframes2[i]



Output:

    dataframes2.tail(5)

 fecha  TEMP_C_Avg  RH_Avg  Prueba TEMP_C_Avg  Prueba RH_Avg
21 2020-01-01 22:00:00        46.0     103                NaN            NaN
22 2020-01-01 23:00:00        29.0     103                NaN            NaN
23 2020-01-02 00:00:00        31.0       3                NaN            NaN
24 2020-01-02 12:00:00        31.0       2                NaN            NaN
25 2020-02-01 10:00:00        29.0       5               29.0            5.0

My desired Output is:

Output:

fecha  TEMP_C_Avg  RH_Avg  Prueba TEMP_C_Avg  Prueba RH_Avg
21 2020-01-01 22:00:00        46.0     103                NaN            NaN
22 2020-01-01 23:00:00        29.0     103               29.0            NaN
23 2020-01-02 00:00:00        31.0       3               31.0            3.0
24 2020-01-02 12:00:00        31.0       2               31.0            2.0
25 2020-02-01 10:00:00        29.0       5               29.0            5.0

Appreciate if anyone can help me.
Update: The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
An example of the data:
                 fecha  TEMP_C_Avg  RH_Avg
25 2020-02-01 10:00:00        29.0       5
26 2020-02-01 11:00:00        32.0     105
27 2020-03-01 10:00:00        55.0       3
28 2020-03-01 11:00:00        40.0       5
29 2020-04-01 10:00:00        10.0      20
30 2020-04-01 11:00:00         5.0      15
31 2020-05-01 10:00:00        20.0      15
32 2020-05-01 11:00:00         5.0     106
33 2020-06-01 10:00:00        33.0     107
34 2020-06-01 11:00:00        20.0      20

Rob Raymond · Accepted Answer

With clear understanding

have encoded monthly limits into a dict limits
use numpy select(), when a condition matches take value corresponding to condition from second parameter. Default to third parameter
build conditions dynamically from limits dict
second parameter needs to be same length as conditions list. Build list of np.nan as list comprehension so it's correct length
to consider all columns, use a dict comprehension that builds **kwarg params to assign()

df = pd.read_csv(io.StringIO("""                 fecha  TEMP_C_Avg  RH_Avg
25  2020-02-01 10:00:00        29.0       5
26  2020-02-01 11:00:00        32.0     105
27  2020-03-01 10:00:00        55.0       3
28  2020-03-01 11:00:00        40.0       5
29  2020-04-01 10:00:00        10.0      20
30  2020-04-01 11:00:00         5.0      15
31  2020-05-01 10:00:00        20.0      15
32  2020-05-01 11:00:00         5.0     106
33  2020-06-01 10:00:00        33.0     107
34  2020-06-01 11:00:00        20.0      20"""), sep="\s\s+", engine="python")
df.fecha = pd.to_datetime(df.fecha)

# The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
limits = {1:45, 2:30, 3:45, 4:10, 5:15, 6:30}

df = df.assign(**{f"Prueba {c}":np.select( # construct target column name
        # build a condition for each of the month limits
        [df.fecha.dt.month.eq(m) & df[c].gt(l) for m,l in limits.items()], 
        [np.nan for m in limits.keys()], # NaN if beyond limit
        df[c]) # keep value if within limits
            for c in df.columns if "Avg" in c}) # do calc for all columns that have "Avg" in name

	fecha	TEMP_C_Avg	RH_Avg	Prueba TEMP_C_Avg	Prueba RH_Avg
25	2020-02-01 10:00:00	29	5	29	5
26	2020-02-01 11:00:00	32	105	nan	nan
27	2020-03-01 10:00:00	55	3	nan	3
28	2020-03-01 11:00:00	40	5	40	5
29	2020-04-01 10:00:00	10	20	10	nan
30	2020-04-01 11:00:00	5	15	5	nan
31	2020-05-01 10:00:00	20	15	nan	15
32	2020-05-01 11:00:00	5	106	5	nan
33	2020-06-01 10:00:00	33	107	nan	nan
34	2020-06-01 11:00:00	20	20	20	20

How to apply different conditions to different months in a pandas Dataframe?

Answers (1)

Related Questions