Reputation: 375
I am trying to clean a dataset like this:
date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
<dttm> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 2020-02-04 00:00:00 26.8 G 1.02 G 35.8 G
2 2020-02-04 00:01:00 27.1 G 0.98 G 35.7 G
3 2020-02-04 00:02:00 27.2 G 0.95 G 35.8 G
4 2020-02-04 00:03:00 27.0 G 0.94 G 35.7 G
5 2020-02-04 00:04:00 26.8 G 0.93 G 35.6 G
6 2020-02-04 00:05:00 26.7 G 0.94 G 35.6 G
7 2020-02-04 00:06:00 26.9 G 0.92 G 35.7 G
8 2020-02-04 00:07:00 27.0 G 0.83 G 35.4 G
9 2020-02-04 00:08:00 27.4 G 0.81 G 35.4 G
10 2020-02-04 00:09:00 27.8 G 0.72 G 35.0 G
It has the data value in one column and next to it a column with a flag associated with that value. I want to remove all other flags different than "G" from that column and replace the value of the related column by NA.
For example if my row is
date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
<dttm> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 2020-02-04 00:00:00 26.8 G 1.02 G 35.8 G
2 2020-02-04 00:01:00 27.1 C 0.98 BIA 35.7 G
10 2020-02-04 00:09:00 27.8 C 0.72 G 35.0 G
I want to get
date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
<dttm> <dbl> <chr> <dbl> <chr> <dbl> <chr>
1 2020-02-04 00:00:00 26.8 G 1.02 G 35.8 G
2 2020-02-04 00:01:00 NA NA NA NA 35.7 G
10 2020-02-04 00:09:00 NA NA 0.72 G 35.0 G
Upvotes: 2
Views: 48
Reputation: 887951
We could do this with split.default
df1[-1] <- do.call(cbind, lapply(split.default(df1[-1],
cumsum(!grepl('flag', names(df1)[-1]))),
function(x) {
x[x[[2]] != 'G', ] <- NA
x}))
df1
# date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
#1 2020-02-04 00:00:00 26.8 G 1.02 G 35.8 G
#2 2020-02-04 00:01:00 NA <NA> NA <NA> 35.7 G
#10 2020-02-04 00:09:00 NA <NA> 0.72 G 35.0 G
df1 <- structure(list(date = c("2020-02-04 00:00:00", "2020-02-04 00:01:00",
"2020-02-04 00:09:00"), pm2.5 = c(26.8, 27.1, 27.8), pm2.5_flag = c("G",
"C", "C"), NEPH = c(1.02, 0.98, 0.72), NEPH_flag = c("G", "BIA",
"G"), NEPH_RH = c(35.8, 35.7, 35), NEPH_RH_flag = c("G", "G",
"G")), class = "data.frame", row.names = c("1", "2", "10"))
Upvotes: 1
Reputation: 389325
You could create pairs of columns with 'flag'
and it's associated corresponding value. Here the columns are next to each other so we can get index of 'flag'
columns and use -1
to get corresponding columns.
flag_cols <- grep('flag', names(df))
inds <- df[flag_cols] != "G"
df[flag_cols][inds] <- NA
df[flag_cols - 1][inds] <- NA
df
# date pm2.5 pm2.5_flag NEPH NEPH_flag NEPH_RH NEPH_RH_flag
#1 2020-02-0400:00:00 26.8 G 1.02 G 35.8 G
#2 2020-02-0400:01:00 NA <NA> NA <NA> 35.7 G
#10 2020-02-0400:09:00 NA <NA> 0.72 G 35.0 G
Upvotes: 1