lancet
lancet

Reputation: 33

Replace NA with value based on row's condition

A five columns table ("id", "othermood_v","rass_v", "gcs_v" and "cam_v") with around 52000 rows. There are three values in the last column ("cam_v"):0,1,2 as a class label. The "cam_v" column now has value as 1, 2 and NA. I would like to replace the NA value with either 0 or 1 based the other three columns "othermood_v", "rass_v" and "gcs_v". So if any of these three columns in the same row has a value like 1, then cam_v would be labeled as 1, otherwise 0. I tried to loop through data with a condition like

 if df$othermood_v>0|df$rass_v>0|df$gcs_v >0, then df$cam_v=1 else 
  0, rm NA = True

or

if (df$othermood_v+df$rass_v+df$gcs_v) >0, then cam_v=1 else 0

But I don't know how to get it to work. Any suggestions? BTW, the id is unique now. Thanks.

id  othermood_v rass_v  gcs_v   cam_v
100078  0   0   0   NA
100079  0   0   0   NA
100081  0   0   0   NA
100085  1   1   0   NA
100087  1   1   0   NA
100088  1   0   0   NA
100091  1   1   1   2
100094  0   1   0   NA
100095  1   0   0   NA
100096  0   0   0   NA
100098  1   1   1   2
100099  0   1   0   NA
100102  1   0   0   NA
100103  1   0   0   NA
100104  1   1   0   2
100106  0   0   0   NA
100108  1   0   0   NA
100109  1   0   0   NA
100112  1   0   0   NA
100113  1   1   1   1
100114  1   0   0   NA
100116  1   0   0   NA
100117  1   0   0   NA
100118  0   1   0   NA

table screenshot

Upvotes: 2

Views: 393

Answers (3)

MLPNPC
MLPNPC

Reputation: 525

You were close with your method, You just needed to change how you were doing if else. The below should work:

df$cam_v<-ifelse((df$othermood_v>0|df$rass_v>0|df$gcs_v >0), 1,0) 

Upvotes: 0

akrun
akrun

Reputation: 887941

We create a logical vector and then replace using another condition created with rowSums

i1 <- is.na(df1$cam_v) # logical index of NA elements in 'cam_v'
# assign the values 0 or 1 based on the occurrence of 1 in 
# either one of the columns from 2 to 4
df1$cam_v[i1] <- +(rowSums(df1[i1, 2:4] > 0) > 0)

data

df1 <- structure(list(id = c(100078L, 100079L, 100081L, 100085L, 100087L, 
100088L, 100091L, 100094L, 100095L, 100096L, 100098L, 100099L, 
100102L, 100103L, 100104L, 100106L, 100108L, 100109L, 100112L, 
100113L, 100114L, 100116L, 100117L, 100118L), othermood_v = c(0L, 
0L, 0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 0L), rass_v = c(0L, 0L, 0L, 1L, 1L, 0L, 
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 
0L, 1L), gcs_v = c(0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L), cam_v = c(NA, 
NA, NA, NA, NA, NA, 2L, NA, NA, NA, 2L, NA, NA, NA, 2L, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-24L))

Upvotes: 1

JBGruber
JBGruber

Reputation: 12478

A solution using dplyr

library(dplyr)
df_clean <- df %>% 
  mutate(cam_v = ifelse(!is.na(cam_v), cam_v, 
                               ifelse((othermood_v + rass_v + gcs_v) > 0, 1, 0)))
> df_clean
       id othermood_v rass_v gcs_v cam_v
1  100078           0      0     0     0
2  100079           0      0     0     0
3  100081           0      0     0     0
4  100085           1      1     0     1
5  100087           1      1     0     1
6  100088           1      0     0     1
7  100091           1      1     1     2
8  100094           0      1     0     1
9  100095           1      0     0     1
10 100096           0      0     0     0
11 100098           1      1     1     2
12 100099           0      1     0     1
13 100102           1      0     0     1
14 100103           1      0     0     1
15 100104           1      1     0     2
16 100106           0      0     0     0
17 100108           1      0     0     1
18 100109           1      0     0     1
19 100112           1      0     0     1
20 100113           1      1     1     1
21 100114           1      0     0     1
22 100116           1      0     0     1
23 100117           1      0     0     1
24 100118           0      1     0     1

Data

Generally, it is preferred here to use dput(head(data, 20)) to provide sample data for your code. I used this to transform yours data:

df <- read.table(text =
  "id  othermood_v rass_v  gcs_v   cam_v
  100078  0   0   0   NA
  100079  0   0   0   NA
  100081  0   0   0   NA
  100085  1   1   0   NA
  100087  1   1   0   NA
  100088  1   0   0   NA
  100091  1   1   1   2
  100094  0   1   0   NA
  100095  1   0   0   NA
  100096  0   0   0   NA
  100098  1   1   1   2
  100099  0   1   0   NA
  100102  1   0   0   NA
  100103  1   0   0   NA
  100104  1   1   0   2
  100106  0   0   0   NA
  100108  1   0   0   NA
  100109  1   0   0   NA
  100112  1   0   0   NA
  100113  1   1   1   1
  100114  1   0   0   NA
  100116  1   0   0   NA
  100117  1   0   0   NA
  100118  0   1   0   NA", header = TRUE)

Upvotes: 1

Related Questions