Andrew Bannerman
Andrew Bannerman

Reputation: 1305

Combine two columns (numerical values) and retain NAs

I have two columns that I wish to combine. I would usually sum both columns. However, in this case I need to retain the original NAs and only have the numerical values combined from two columns to one. I have tried to use ifelse statements but since im working with 2x columns then combining both is trickier with nested ifelse.

Heres my example data:

# Example

    data <- c(80,7.692307692,
              7.692307692
              ,8.333333333
              ,9.090909091
              ,20
              ,27.27272727
              ,50
              ,50
              ,21.42857143
              ,58.33333333
              ,46.66666667
              ,78.06451613
              ,186.15384615
              ,42.85714286
              ,44.1860465,
              20,
              25,
              40,45,78,55)
    df <- data.frame(data)
    df$long <- ifelse(df$data <20,1,0) # print 1s
    df$long_exit <- df$data <=70 # logical
    df$long_sigs <- ifelse(df$long_exit == FALSE,0,NA)  # convert FALSE to 1 and retain NAs

    desired_sum_output <- c(0,1,1,1,1,NA,NA,NA,NA,NA,NA,NA,0,0,NA,NA,NA,NA,NA,NA,0,NA)

    df <- data.frame(df,desired_sum_output)

    #sum df$long + df$long_sigs to and retain 0
    sums <- df$long + df$long_sigs # summing 

    > df
             data long long_exit long_sigs desired_sum_output
    1   80.000000    0     FALSE         0                  0
    2    7.692308    1      TRUE        NA                  1
    3    7.692308    1      TRUE        NA                  1
    4    8.333333    1      TRUE        NA                  1
    5    9.090909    1      TRUE        NA                  1
    6   20.000000    0      TRUE        NA                 NA
    7   27.272727    0      TRUE        NA                 NA
    8   50.000000    0      TRUE        NA                 NA
    9   50.000000    0      TRUE        NA                 NA
    10  21.428571    0      TRUE        NA                 NA
    11  58.333333    0      TRUE        NA                 NA
    12  46.666667    0      TRUE        NA                 NA
    13  78.064516    0     FALSE         0                  0
    14 186.153846    0     FALSE         0                  0
    15  42.857143    0      TRUE        NA                 NA
    16  44.186047    0      TRUE        NA                 NA
    17  20.000000    0      TRUE        NA                 NA
    18  25.000000    0      TRUE        NA                 NA
    19  40.000000    0      TRUE        NA                 NA
    20  45.000000    0      TRUE        NA                 NA
    21  78.000000    0     FALSE         0                  0
    22  55.000000    0      TRUE        NA                 NA

This is usually reasonable:

df$z <- rowSums(df[, c("long", "long_sigs")], na.rm=TRUE) 

However i lose my NA positioning.

Reason for retaining NA's is I will use na.locf from zoo packaged to forward fill the NA values.

Upvotes: 1

Views: 53

Answers (1)

neilfws
neilfws

Reputation: 33772

If I understand correctly, you want to sum with na.rm = TRUE where long = 1, otherwise retain the NA. So this should create z = desired_sum_output:

df$z <- ifelse(df$long == 1, 
               rowSums(df[, c("long", "long_sigs")], na.rm=TRUE), 
               rowSums(df[, c("long", "long_sigs")]))

Upvotes: 3

Related Questions