tonysg
tonysg

Reputation: 64

How to apply different conditions to different months in a pandas Dataframe?

so I'm trying to apply different conditions that depends on a date, months to be specific. For example, for January replace the data in TEMP that is above 45 but for February that is above 30 and so on. I already did that with the code below, but the problem is that the data from the previous month is replace it with nan.

This is my code:

meses = ["01", "02"]

for i in var_vars:
    if i in dataframes2.columns.values:
        for j in range(len(meses)):
            test_prueba_mes = dataframes2[i].loc[dataframes2['fecha'].dt.month == int(meses[j])]
            test_prueba = test_prueba_mes[dataframes2[i]<dataframes.loc[i]["X"+meses[j]+".max"]]
            dataframes2["Prueba " + str(i)] = test_prueba

Output:

    dataframes2.tail(5)

 fecha  TEMP_C_Avg  RH_Avg  Prueba TEMP_C_Avg  Prueba RH_Avg
21 2020-01-01 22:00:00        46.0     103                NaN            NaN
22 2020-01-01 23:00:00        29.0     103                NaN            NaN
23 2020-01-02 00:00:00        31.0       3                NaN            NaN
24 2020-01-02 12:00:00        31.0       2                NaN            NaN
25 2020-02-01 10:00:00        29.0       5               29.0            5.0

My desired Output is:

Output:

fecha  TEMP_C_Avg  RH_Avg  Prueba TEMP_C_Avg  Prueba RH_Avg
21 2020-01-01 22:00:00        46.0     103                NaN            NaN
22 2020-01-01 23:00:00        29.0     103               29.0            NaN
23 2020-01-02 00:00:00        31.0       3               31.0            3.0
24 2020-01-02 12:00:00        31.0       2               31.0            2.0
25 2020-02-01 10:00:00        29.0       5               29.0            5.0

Appreciate if anyone can help me.

Update: The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30

An example of the data:

                 fecha  TEMP_C_Avg  RH_Avg
25 2020-02-01 10:00:00        29.0       5
26 2020-02-01 11:00:00        32.0     105
27 2020-03-01 10:00:00        55.0       3
28 2020-03-01 11:00:00        40.0       5
29 2020-04-01 10:00:00        10.0      20
30 2020-04-01 11:00:00         5.0      15
31 2020-05-01 10:00:00        20.0      15
32 2020-05-01 11:00:00         5.0     106
33 2020-06-01 10:00:00        33.0     107
34 2020-06-01 11:00:00        20.0      20

Upvotes: 0

Views: 181

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

With clear understanding

  • have encoded monthly limits into a dict limits
  • use numpy select(), when a condition matches take value corresponding to condition from second parameter. Default to third parameter
  • build conditions dynamically from limits dict
  • second parameter needs to be same length as conditions list. Build list of np.nan as list comprehension so it's correct length
  • to consider all columns, use a dict comprehension that builds **kwarg params to assign()
df = pd.read_csv(io.StringIO("""                 fecha  TEMP_C_Avg  RH_Avg
25  2020-02-01 10:00:00        29.0       5
26  2020-02-01 11:00:00        32.0     105
27  2020-03-01 10:00:00        55.0       3
28  2020-03-01 11:00:00        40.0       5
29  2020-04-01 10:00:00        10.0      20
30  2020-04-01 11:00:00         5.0      15
31  2020-05-01 10:00:00        20.0      15
32  2020-05-01 11:00:00         5.0     106
33  2020-06-01 10:00:00        33.0     107
34  2020-06-01 11:00:00        20.0      20"""), sep="\s\s+", engine="python")
df.fecha = pd.to_datetime(df.fecha)

# The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
limits = {1:45, 2:30, 3:45, 4:10, 5:15, 6:30}

df = df.assign(**{f"Prueba {c}":np.select( # construct target column name
        # build a condition for each of the month limits
        [df.fecha.dt.month.eq(m) & df[c].gt(l) for m,l in limits.items()], 
        [np.nan for m in limits.keys()], # NaN if beyond limit
        df[c]) # keep value if within limits
            for c in df.columns if "Avg" in c}) # do calc for all columns that have "Avg" in name

fecha TEMP_C_Avg RH_Avg Prueba TEMP_C_Avg Prueba RH_Avg
25 2020-02-01 10:00:00 29 5 29 5
26 2020-02-01 11:00:00 32 105 nan nan
27 2020-03-01 10:00:00 55 3 nan 3
28 2020-03-01 11:00:00 40 5 40 5
29 2020-04-01 10:00:00 10 20 10 nan
30 2020-04-01 11:00:00 5 15 5 nan
31 2020-05-01 10:00:00 20 15 nan 15
32 2020-05-01 11:00:00 5 106 5 nan
33 2020-06-01 10:00:00 33 107 nan nan
34 2020-06-01 11:00:00 20 20 20 20

Upvotes: 1

Related Questions