user1777900
user1777900

Reputation: 985

Replacing values in multiple columns with Pandas based on conditions

I have a very large dataframe where I only want to change the values in a small continuous subset of columns. Basically, in those columns the values are either integers or null. All I want is to replace the 0's and nulls with 'No' and everything else with 'Yes' only in those columns

In R, this can be done basically with a one liner:

df <- df %>%
mutate_at(vars(MCI:BNP), ~factor(case_when(. > 0 ~ 'Yes',
                                           TRUE ~ 'No')))

But we're working in Python and I can't quite figure out the equivalent using Pandas. I've been messing around with loc and iloc, which work fine when only changing a single column but I must be missing something when it comes to modifying multiple columns. And the answers I've found on other stackoverflow answers have all mostly been just changing the value in a single column based on some set of conditions

col1 = df.columns.get_loc("MCI")
col2 = df.columns.get_loc("BNP")
df.iloc[:,col1:col2]

Will get me the columns I want, but trying to call loc doesn't work with multidimensional keys. I even tried it with the columns as a list instead of by integer index by creating a an extra variable

binary_var = ['MCI','PVD','CVA','DEMENTIA','CPD','RD','PUD','MLD','DWOC','DWC','HoP','RND','MALIGNANCY','SLD','MST','HIV','AKF',
          'ARMD','ASPHY','DEP','DWLK','DRUGA','DUOULC','FALL','FECAL','FLDELEX','FRAIL','GASTRICULC','GASTROULC','GLAU','HYPERKAL',
          'HYPTEN','HYPOKAL','HYPOTHYR','HYPOXE','IMMUNOS','ISCHRT','LIPIDMETA','LOSWIGT','LOWBAK','MALNUT','OSTEO','PARKIN',
          'PNEUM','RF','SEIZ','SD','TUML','UI','VI','MENTAL','FUROSEMIDE','METOPROLOL','ASPIRIN','OMEPRAZOLE','LISINOPRIL','DIGOXIN',
          'ALDOSTERONE_ANTAGONIST','ACE_INHIBITOR','ANGIOTENSIN_RECEPTOR_BLOCKERS','BETA_BLOCKERSDIURETICHoP','BUN','CREATININE',
          'SODIUM','POTASSIUM','HEMOGLOBIN','WBC_COUNT','CHLORIDE','ALBUMIN','TROPONIN','BNP']
df.loc[df[binary_var] == 0, binary_var]

But then it just can't find the index for those column names at all. I think Pandas also has problems converting columns that were originally integers into No/Yes. I don't need to do this in place, I'm probably just missing something simple that pandas has built in hopefully

In a very psuedo-code description, all I really want is this

if(df.iloc[:,col1:col2] == 0 || df.iloc[:,col1:col2].isnull())
    df ONLY in that subset of column = 'No'
else
    df ONLY in that subset of column = 'Yes'

Upvotes: 1

Views: 643

Answers (1)

Use:

df.loc[:, 'MCI':'BNP'] = np.where(df.loc[:, 'MCI':'BNP'] > 0, 'Yes', 'No')

Upvotes: 1

Related Questions