Keith W. Larson
Keith W. Larson

Reputation: 1573

How do I replace NA in one column if another column's value is greater than 0?

I have 8 census lines (L1:L8). Currently some of the records have NA rather than 0 when they have been censused. I would like to replace all NA with 0 in each of the columns (L1:L8) when their corresponding effort column (EFFORT_L1:EFFORT_L8) has a value greater than 0 (meaning they have been censused).

Sample Data:

df <-structure(list(KARTA = c("02C2H", "02C2H", "02C2H", "02C2H", 
"02C2H", "02C2H"), YEAR = c(1997L, 1997L, 1997L, 1997L, 1997L, 
1997L), ART = c("009", "031", "012", "057", "065", "073"), L1 = c(NA, 
NA, NA, NA, 2, NA), L2 = c(NA, NA, 7, NA, 3, NA), L3 = c(NA, 
NA, NA, NA, 1, NA), L4 = c(NA, NA, NA, NA, 1, NA), L5 = c(NA, 
NA, NA, NA, 1, NA), L6 = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), L7 = c(NA, NA, NA, 1, NA, 1), L8 = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), EFFORT_L1 = c(10, 
10, 10, 10, 10, 10), EFFORT_L2 = c(10, 10, 10, 10, 10, 10), EFFORT_L3 = c(9.625, 
9.625, 9.625, 9.625, 9.625, 9.625), EFFORT_L4 = c(10, 10, 10, 
10, 10, 10), EFFORT_L5 = c(9.125, 9.125, 9.125, 9.125, 9.125, 
9.125), EFFORT_L6 = c(9.75, 9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L7 = c(9.75, 
9.75, 9.75, 9.75, 9.75, 9.75), EFFORT_L8 = c(10, 10, 10, 10, 
10, 10), Total_Route_Effort = c(78.25, 78.25, 78.25, 78.25, 78.25, 
78.25)), .Names = c("KARTA", "YEAR", "ART", "L1", "L2", "L3", 
"L4", "L5", "L6", "L7", "L8", "EFFORT_L1", "EFFORT_L2", "EFFORT_L3", 
"EFFORT_L4", "EFFORT_L5", "EFFORT_L6", "EFFORT_L7", "EFFORT_L8", 
"Total_Route_Effort"), row.names = c(NA, 6L), class = "data.frame")

Sample code for a single column (note I am seeking an efficient solution for all eight columns):

df[is.na(df[,"L1"]) & df[,"EFFORT_L1"] > 0, "L1"] <- 0

Upvotes: 1

Views: 1003

Answers (3)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193497

This doesn't exactly answer your question, but might help you with some future problems: Having looked at both your questions from today, Have you considered converting your data into a semi-long format and working with that instead?

Here's a toy example:

Sample data

set.seed(1)
myDF <- data.frame(
  ID1 = sample(letters[1:5], 5, replace = TRUE),
  ID2 = 1:5, ID3 = "999",
  V1 = 99, V2 = 99, V3 = 99,
  EV1 = sample(0:5, 5, replace = TRUE),
  EV2 = sample(0:3, 5, replace = TRUE),
  EV3 = sample(0:2, 5, replace = TRUE),
  stringsAsFactors = FALSE
)
myDF$ID3[c(1, 4)] <- 100
myDF$V1[c(4, 5)] <- 100
myDF$V2[c(1, 3, 5)] <- 100
myDF
#   ID1 ID2 ID3  V1  V2 V3 EV1 EV2 EV3
# 1   b   1 100  99 100 99   5   0   1
# 2   b   2 999  99  99 99   5   0   2
# 3   c   3 999  99 100 99   3   2   2
# 4   e   4 100 100  99 99   3   1   1
# 5   b   5 999 100 100 99   0   3   2

Data in a semi-long (or semi-wide, depending on your perspective) format

myDFLong <- reshape(myDF, direction = "long", idvar = 1:3,
                    varying = 4:ncol(myDF), sep = "")
myDFLong
#           ID1 ID2 ID3 time   V EV
# b.1.100.1   b   1 100    1  99  5
# b.2.999.1   b   2 999    1  99  5
# c.3.999.1   c   3 999    1  99  3
# e.4.100.1   e   4 100    1 100  3
# b.5.999.1   b   5 999    1 100  0
# b.1.100.2   b   1 100    2 100  0
# b.2.999.2   b   2 999    2  99  0
# c.3.999.2   c   3 999    2 100  2
# e.4.100.2   e   4 100    2  99  1
# b.5.999.2   b   5 999    2 100  3
# b.1.100.3   b   1 100    3  99  1
# b.2.999.3   b   2 999    3  99  2
# c.3.999.3   c   3 999    3  99  2
# e.4.100.3   e   4 100    3  99  1
# b.5.999.3   b   5 999    3  99  2

Notice that we now have just one column for the equivalent of your "L" columns and one column for the equivalent of your "EFFORT_L" columns. A "time" variable has been created (equivalent to your 1-8 "census lines").

Answering both your questions from today

Both your questions so far from today can then easily be addressed with some simple ifelse statements.

# Your first question from today
myDFLong$V <- with(myDFLong, ifelse(ID3 == 999 & V == 99, NA, V))
# Continuation from that point
myDFLong$V <- with(myDFLong, ifelse(EV > 0 & is.na(V), 0, V))
myDFLong
#           ID1 ID2 ID3 time   V EV
# b.1.100.1   b   1 100    1  99  5
# b.2.999.1   b   2 999    1   0  5
# c.3.999.1   c   3 999    1   0  3
# e.4.100.1   e   4 100    1 100  3
# b.5.999.1   b   5 999    1 100  0
# b.1.100.2   b   1 100    2 100  0
# b.2.999.2   b   2 999    2  NA  0
# c.3.999.2   c   3 999    2 100  2
# e.4.100.2   e   4 100    2  99  1
# b.5.999.2   b   5 999    2 100  3
# b.1.100.3   b   1 100    3  99  1
# b.2.999.3   b   2 999    3   0  2
# c.3.999.3   c   3 999    3   0  2
# e.4.100.3   e   4 100    3  99  1
# b.5.999.3   b   5 999    3   0  2

Final stages: going back to a wide format if you prefer

You can re-convert to a wide format using base R, but in this case, it would be much easier to use the "reshape2" package as follows:

library(reshape2)
myDF2 <- melt(myDFLong, id.vars=1:4)
myDFFinal <- dcast(myDF2, ID1 + ID2 + ID3 ~ variable + time)
myDFFinal
#   ID1 ID2 ID3 V_1 V_2 V_3 EV_1 EV_2 EV_3
# 1   b   1 100  99 100  99    5    0    1
# 2   b   2 999   0  NA   0    5    0    2
# 3   b   5 999 100 100   0    0    3    2
# 4   c   3 999   0 100   0    3    2    2
# 5   e   4 100 100  99  99    3    1    1

However, I would suggest that you do that only at the very end--a lot of things like plotting functions and so on like data to be in a long or semi-long format to begin with, so it might be worth your time considering it for your data.

Do take care, though, because your data currently has named rows--you would need to add those in as a column to your data to make best use of them as additional ID variables.

Upvotes: 3

juba
juba

Reputation: 49033

If the number of "L" variables is not fixed, you can use this :

l.vars <- grep("^L\\d$", names(df),value=TRUE)
for (v in l.vars) {
  effort.var <- paste0("EFFORT_", v)
  df[is.na(df[,v]) & df[,effort.var] > 0, v] <- 0
}

Upvotes: 1

Sven Hohenstein
Sven Hohenstein

Reputation: 81683

df[paste0("L", 1:8)][is.na(df[paste0("L", 1:8)]) 
                     & df[paste0("EFFORT_L", 1:8)] > 0] <- 0

The result:

> df
  KARTA YEAR ART L1 L2 L3 L4 L5 L6 L7 L8 EFFORT_L1 EFFORT_L2
1 02C2H 1997 009  0  0  0  0  0  0  0  0        10        10
2 02C2H 1997 031  0  0  0  0  0  0  0  0        10        10
3 02C2H 1997 012  0  7  0  0  0  0  0  0        10        10
4 02C2H 1997 057  0  0  0  0  0  0  1  0        10        10
5 02C2H 1997 065  2  3  1  1  1  0  0  0        10        10
6 02C2H 1997 073  0  0  0  0  0  0  1  0        10        10
  EFFORT_L3 EFFORT_L4 EFFORT_L5 EFFORT_L6 EFFORT_L7 EFFORT_L8
1     9.625        10     9.125      9.75      9.75        10
2     9.625        10     9.125      9.75      9.75        10
3     9.625        10     9.125      9.75      9.75        10
4     9.625        10     9.125      9.75      9.75        10
5     9.625        10     9.125      9.75      9.75        10
6     9.625        10     9.125      9.75      9.75        10
  Total_Route_Effort
1              78.25
2              78.25
3              78.25
4              78.25
5              78.25
6              78.25

Upvotes: 4

Related Questions