Python pandas: updating dataframe series values based on values contained in another dataframe

Question

I am using pandas with python and I have a dataframe data. I have another dataframe missing_vals. missing_vals contains a field column and a key column. The field column contains elements that correspond to names of the columns of data i.e data.columns ~= missing_vals['field']. The mapping, however, is not one-to-one (some entries in missing_vals['field'] do not exist in data.columns. I did a set intersection operation to take care of that and got an output array result containing all the values that are both in missing_vals['field'] and data.columns. Now I want to index into data using each element of result, check to see if that column contains the value corresponding to the element in missing_vals['key'] and replace it with NaN. I tried using for-loops, but I know this is not the ideal way to do it. Is there a way to do it with vector/lambda operations or perhaps with other dataframe functions? I am new to pandas so I would really appreciate some help.

Here is my code so far:

for i in range(len(result)): field = missing_vals['field'][i] for j in range(data[field].size): if (data[field][j] == missing_vals['key'][i]): data.replace(data[field][j], np.nan)

Thanks

JoeCondron · Accepted Answer

You should really post sample input/output - these things are difficult explain verbally. Anyway, I think the second loop can be done away with entirely. You really just have to do.

field = missing_vals['field'][i]
data[field].replace(missing_vals['key'][i], np.nan)

The replace method replace all occurances with the replacement value and if there are none it does nothing. It's unnecessary to loop through the columns yourself to check if the value to be replaced is there. If you post representative examples of the data frames in question I can probably help you more.

Python pandas: updating dataframe series values based on values contained in another dataframe

Answers (1)

Related Questions