Jhonathan
Jhonathan

Reputation: 375

replace values for NA per related columns in r

I am trying to clean a dataset like this:

       date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
   <dttm>              <dbl> <chr>      <dbl> <chr>       <dbl> <chr>       
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  27.1 G           0.98 G            35.7 G           
 3 2020-02-04 00:02:00  27.2 G           0.95 G            35.8 G           
 4 2020-02-04 00:03:00  27.0 G           0.94 G            35.7 G           
 5 2020-02-04 00:04:00  26.8 G           0.93 G            35.6 G           
 6 2020-02-04 00:05:00  26.7 G           0.94 G            35.6 G           
 7 2020-02-04 00:06:00  26.9 G           0.92 G            35.7 G           
 8 2020-02-04 00:07:00  27.0 G           0.83 G            35.4 G           
 9 2020-02-04 00:08:00  27.4 G           0.81 G            35.4 G           
10 2020-02-04 00:09:00  27.8 G           0.72 G            35.0 G 

It has the data value in one column and next to it a column with a flag associated with that value. I want to remove all other flags different than "G" from that column and replace the value of the related column by NA.

For example if my row is

date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
   <dttm>              <dbl> <chr>      <dbl> <chr>       <dbl> <chr>       
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  27.1 C           0.98 BIA          35.7 G 
10 2020-02-04 00:09:00  27.8 C           0.72 G            35.0 G

I want to get

  date                pm2.5 pm2.5_flag  NEPH NEPH_flag NEPH_RH NEPH_RH_flag
   <dttm>              <dbl> <chr>      <dbl> <chr>       <dbl> <chr>       
 1 2020-02-04 00:00:00  26.8 G           1.02 G            35.8 G           
 2 2020-02-04 00:01:00  NA   NA          NA   NA           35.7 G 
10 2020-02-04 00:09:00  NA   NA          0.72 G            35.0 G

Upvotes: 2

Views: 48

Answers (2)

akrun
akrun

Reputation: 887951

We could do this with split.default

df1[-1] <- do.call(cbind,  lapply(split.default(df1[-1], 
         cumsum(!grepl('flag', names(df1)[-1]))),
                function(x) {
           x[x[[2]] != 'G', ] <- NA
         x}))
df1
#                  date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
#1  2020-02-04 00:00:00  26.8          G 1.02         G    35.8            G
#2  2020-02-04 00:01:00    NA       <NA>   NA      <NA>    35.7            G
#10 2020-02-04 00:09:00    NA       <NA> 0.72         G    35.0            G

data

df1 <- structure(list(date = c("2020-02-04 00:00:00", "2020-02-04 00:01:00", 
"2020-02-04 00:09:00"), pm2.5 = c(26.8, 27.1, 27.8), pm2.5_flag = c("G", 
"C", "C"), NEPH = c(1.02, 0.98, 0.72), NEPH_flag = c("G", "BIA", 
"G"), NEPH_RH = c(35.8, 35.7, 35), NEPH_RH_flag = c("G", "G", 
"G")), class = "data.frame", row.names = c("1", "2", "10"))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389325

You could create pairs of columns with 'flag' and it's associated corresponding value. Here the columns are next to each other so we can get index of 'flag' columns and use -1 to get corresponding columns.

flag_cols <- grep('flag', names(df))
inds <- df[flag_cols] != "G"
df[flag_cols][inds] <- NA
df[flag_cols - 1][inds] <- NA
df

#                 date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
#1  2020-02-0400:00:00  26.8          G 1.02         G    35.8            G
#2  2020-02-0400:01:00    NA       <NA>   NA      <NA>    35.7            G
#10 2020-02-0400:09:00    NA       <NA> 0.72         G    35.0            G

Upvotes: 1

Related Questions