Reputation: 64
so I'm trying to apply different conditions that depends on a date, months to be specific. For example, for January replace the data in TEMP that is above 45 but for February that is above 30 and so on. I already did that with the code below, but the problem is that the data from the previous month is replace it with nan.
This is my code:
meses = ["01", "02"]
for i in var_vars:
if i in dataframes2.columns.values:
for j in range(len(meses)):
test_prueba_mes = dataframes2[i].loc[dataframes2['fecha'].dt.month == int(meses[j])]
test_prueba = test_prueba_mes[dataframes2[i]<dataframes.loc[i]["X"+meses[j]+".max"]]
dataframes2["Prueba " + str(i)] = test_prueba
Output:
dataframes2.tail(5)
fecha TEMP_C_Avg RH_Avg Prueba TEMP_C_Avg Prueba RH_Avg
21 2020-01-01 22:00:00 46.0 103 NaN NaN
22 2020-01-01 23:00:00 29.0 103 NaN NaN
23 2020-01-02 00:00:00 31.0 3 NaN NaN
24 2020-01-02 12:00:00 31.0 2 NaN NaN
25 2020-02-01 10:00:00 29.0 5 29.0 5.0
My desired Output is:
Output:
fecha TEMP_C_Avg RH_Avg Prueba TEMP_C_Avg Prueba RH_Avg
21 2020-01-01 22:00:00 46.0 103 NaN NaN
22 2020-01-01 23:00:00 29.0 103 29.0 NaN
23 2020-01-02 00:00:00 31.0 3 31.0 3.0
24 2020-01-02 12:00:00 31.0 2 31.0 2.0
25 2020-02-01 10:00:00 29.0 5 29.0 5.0
Appreciate if anyone can help me.
Update: The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
An example of the data:
fecha TEMP_C_Avg RH_Avg
25 2020-02-01 10:00:00 29.0 5
26 2020-02-01 11:00:00 32.0 105
27 2020-03-01 10:00:00 55.0 3
28 2020-03-01 11:00:00 40.0 5
29 2020-04-01 10:00:00 10.0 20
30 2020-04-01 11:00:00 5.0 15
31 2020-05-01 10:00:00 20.0 15
32 2020-05-01 11:00:00 5.0 106
33 2020-06-01 10:00:00 33.0 107
34 2020-06-01 11:00:00 20.0 20
Upvotes: 0
Views: 181
Reputation: 31166
With clear understanding
dict
limitsselect()
, when a condition matches take value corresponding to condition from second parameter. Default to third parameterdict
list
. Build list of np.nan
as list
comprehension so it's correct lengthdict
comprehension that builds **kwarg
params to assign()
df = pd.read_csv(io.StringIO(""" fecha TEMP_C_Avg RH_Avg
25 2020-02-01 10:00:00 29.0 5
26 2020-02-01 11:00:00 32.0 105
27 2020-03-01 10:00:00 55.0 3
28 2020-03-01 11:00:00 40.0 5
29 2020-04-01 10:00:00 10.0 20
30 2020-04-01 11:00:00 5.0 15
31 2020-05-01 10:00:00 20.0 15
32 2020-05-01 11:00:00 5.0 106
33 2020-06-01 10:00:00 33.0 107
34 2020-06-01 11:00:00 20.0 20"""), sep="\s\s+", engine="python")
df.fecha = pd.to_datetime(df.fecha)
# The ruleset for 6 months is jan 45, feb 30, mar 45, abr 10, may 15, jun 30
limits = {1:45, 2:30, 3:45, 4:10, 5:15, 6:30}
df = df.assign(**{f"Prueba {c}":np.select( # construct target column name
# build a condition for each of the month limits
[df.fecha.dt.month.eq(m) & df[c].gt(l) for m,l in limits.items()],
[np.nan for m in limits.keys()], # NaN if beyond limit
df[c]) # keep value if within limits
for c in df.columns if "Avg" in c}) # do calc for all columns that have "Avg" in name
fecha | TEMP_C_Avg | RH_Avg | Prueba TEMP_C_Avg | Prueba RH_Avg | |
---|---|---|---|---|---|
25 | 2020-02-01 10:00:00 | 29 | 5 | 29 | 5 |
26 | 2020-02-01 11:00:00 | 32 | 105 | nan | nan |
27 | 2020-03-01 10:00:00 | 55 | 3 | nan | 3 |
28 | 2020-03-01 11:00:00 | 40 | 5 | 40 | 5 |
29 | 2020-04-01 10:00:00 | 10 | 20 | 10 | nan |
30 | 2020-04-01 11:00:00 | 5 | 15 | 5 | nan |
31 | 2020-05-01 10:00:00 | 20 | 15 | nan | 15 |
32 | 2020-05-01 11:00:00 | 5 | 106 | 5 | nan |
33 | 2020-06-01 10:00:00 | 33 | 107 | nan | nan |
34 | 2020-06-01 11:00:00 | 20 | 20 | 20 | 20 |
Upvotes: 1