replace values for NA per related columns in r

Question

I am trying to clean a dataset like this:

       date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
                                        
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  27.1 G           0.98 G            35.7 G           
 3 2020-02-04 00:02:00  27.2 G           0.95 G            35.8 G           
 4 2020-02-04 00:03:00  27.0 G           0.94 G            35.7 G           
 5 2020-02-04 00:04:00  26.8 G           0.93 G            35.6 G           
 6 2020-02-04 00:05:00  26.7 G           0.94 G            35.6 G           
 7 2020-02-04 00:06:00  26.9 G           0.92 G            35.7 G           
 8 2020-02-04 00:07:00  27.0 G           0.83 G            35.4 G           
 9 2020-02-04 00:08:00  27.4 G           0.81 G            35.4 G           
10 2020-02-04 00:09:00  27.8 G           0.72 G            35.0 G

It has the data value in one column and next to it a column with a flag associated with that value. I want to remove all other flags different than "G" from that column and replace the value of the related column by NA.

For example if my row is

date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
                                        
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  27.1 C           0.98 BIA          35.7 G 
10 2020-02-04 00:09:00  27.8 C           0.72 G            35.0 G

I want to get

  date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
                                        
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  NA   NA          NA   NA           35.7 G 
10 2020-02-04 00:09:00  NA   NA          0.72 G            35.0 G

Ronak Shah · Accepted Answer

You could create pairs of columns with 'flag' and it's associated corresponding value. Here the columns are next to each other so we can get index of 'flag' columns and use -1 to get corresponding columns.

flag_cols <- grep('flag', names(df))
inds <- df[flag_cols] != "G"
df[flag_cols][inds] <- NA
df[flag_cols - 1][inds] <- NA
df

#                 date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
#1  2020-02-0400:00:00  26.8          G 1.02         G    35.8            G
#2  2020-02-0400:01:00    NA          NA          35.7            G
#10 2020-02-0400:09:00    NA        0.72         G    35.0            G

replace values for NA per related columns in r

Answers (2)

data

Related Questions