Reputation: 7576
I want to create a new column, which has to adhere to multiple conditions in multiple places. I have three dataframes:
- The one I'd like to add the column to: data
ID Timestamp
25 201605252300 #Y/M/D/H/M
29 201605252315
94 201605252015
- The one that contains info on data availability: dataavail
Timestamp Availability
201605252300 True
201605252315 False
201605252015 True
- The one that contains the actual data weatherdata
ID Timestamp Precipitation
25 201605252300 1
The thing is, that if there is no rain, that is not marked by weatherdata.Precipitation
being 0
, but by simply missing from weatherdata
. However, not all data that is missing means no precipitation, it can also mean technical failure. That's where dataavail
comes in. If for a Timestamp
in data
the Availability
is False
, that means there was a technical problem and there's no data. If for a Timestamp
in data
the Availability
is True
, that means the precipitation was actually 0.
Now, I'd like to combine this info into the data
field. So:
IF the Timestamp/ID
pair from data
exists in weatherdata
, it should simply pull the info from there.
IF the Timestamp/ID
pair from data
DOES NOT exists in weatherdata
, it should check the dataavail
dataframe AND IF the Availability
is True
, the Precipitation
should be 0
, otherwise is should be NA/NaN
so I can impute it later.
So for the above example, the desired output would be
ID Timestamp Precipitation
25 201605252300 1
29 201605252315 NaN
94 201605252015 0
What's the syntax to use in such a setup, where mutiple conditions have to be satisfied, based on sub-conditions? Thank you!
UPDATE
Tried jezrael's solution. I have the following dataframes:
So then I do mergedAll = pd.merge(imp_df, precip, on=['SId','Time'], how='left')
on them. The result:
I've checked programatically, and ALL the data in Intensity
is NaN
. dtypes
match:
UPDATE 2
After sorting my data out, the ˙mask˙ part still doesn't work:
Even though 201311292300,TRUE
is in the availability dataframe, the Intensity gets translated to NaN
instead of 0.
Upvotes: 1
Views: 652
Reputation: 862641
I think you need:
#add weatherdata data if exist
df = pd.merge(data, weatherdata, on=['ID','Timestamp'], how='left')
#get Timestamp values where True in column Availability
vals = dataavail.loc[dataavail.Availability, 'Timestamp']
print (vals)
0 201605252300
2 201605252015
Name: Timestamp, dtype: int64
#True if exist Timistamps in vals and if not exist in df (NaN)
mask = df.Timestamp.isin(vals) & df.Precipitation.isnull()
#convert True to 0
df.Precipitation = df.Precipitation.mask(mask, 0)
print (df)
ID Timestamp Precipitation
0 25 201605252300 1.0
1 29 201605252315 NaN
2 94 201605252015 0.0
Upvotes: 1