Reputation: 377
I am using pandas with python and I have a dataframe data
. I have another dataframe missing_vals
. missing_vals
contains a field
column and a key
column. The field
column contains elements that correspond to names of the columns of data
i.e data.columns ~= missing_vals['field']
. The mapping, however, is not one-to-one (some entries in missing_vals['field']
do not exist in data.columns
. I did a set intersection operation to take care of that and got an output array result
containing all the values that are both in missing_vals['field']
and data.columns
. Now I want to index into data
using each element of result
, check to see if that column contains the value corresponding to the element in missing_vals['key']
and replace it with NaN
. I tried using for-loops, but I know this is not the ideal way to do it. Is there a way to do it with vector/lambda operations or perhaps with other dataframe functions? I am new to pandas so I would really appreciate some help.
Here is my code so far:
for i in range(len(result)):
field = missing_vals['field'][i]
for j in range(data[field].size):
if (data[field][j] == missing_vals['key'][i]):
data.replace(data[field][j], np.nan)
Thanks
Upvotes: 0
Views: 944
Reputation: 8906
You should really post sample input/output - these things are difficult explain verbally. Anyway, I think the second loop can be done away with entirely. You really just have to do.
field = missing_vals['field'][i]
data[field].replace(missing_vals['key'][i], np.nan)
The replace
method replace all occurances with the replacement value and if there are none it does nothing. It's unnecessary to loop through the columns yourself to check if the value to be replaced is there. If you post representative examples of the data frames in question I can probably help you more.
Upvotes: 1