Reputation: 209
I'm fairly new to R and have encountered an interesting problem. My dataset has a lot of NA's and handling them has become a bit problematic.
What I have:
S1 S2 S3 S4
NA NA 3 4
1 3 NA NA
3 NA 5 NA
What I want to do: Calculate Diff = S4 - S3 - S2 - S1. But if S4 is NA then it should do Diff = S3 - S2 - S1. And so on... But if lets say, S4 and S2 are NA's then the Diff should be S3 - S1.
For example in row 2 the answer should be 2. (3-1)
What I need:
S1 S2 S3 S4 Diff
NA NA 3 4 1
1 3 NA NA 2
3 NA 5 NA 2
What I have written up till now: (but doesn't work for handling NA's)
data <- cbind(data, Diff = data$S4 - data$S3 -data$S2 -data$S1)
Additional Info: The values under each column in the original problem are timestamps.
What the data looks like:
head(select(data,S4,S3,S2,S1))
S4 S3 S2 S1
1 2020-07-08 07:46:36 NA 2020-07-08 07:08:31 NA
2 2020-07-08 07:11:57 NA 2020-07-08 07:08:53 NA
3 2020-07-08 07:12:50 NA 2020-07-08 07:09:40 NA
4 2020-07-08 07:13:14 NA 2020-07-08 07:09:42 NA
5 2020-07-08 07:13:46 NA 2020-07-08 07:10:36 NA
6 2020-07-08 07:44:45 NA 2020-07-08 07:11:42 NA
Updated Version of dataset: (Contains the most accurate representation of the dataset - timestamps are converted to seconds.) What I need? The "Diff" column.
S4 S3 S2 S1 Diff
1 NA 30 40 NA 10
1 NA 30 NA 40 10
1 10 30 40 50 40
1 30 NA 40 NA 10
1 10 30 40 NA 30
1 NA 30 40 50 20
Found the answer here: Take difference between first and last observations in a row, where each row is different
Upvotes: 0
Views: 75
Reputation: 101099
Here is a base R option via apply
df$Diff <- apply(df,1,function(x) 2*tail(na.omit(x),1)-sum(x,na.rm = TRUE))
or
df$Diff <- apply(df,1,function(x) Reduce("-",rev(na.omit(x))))
such that
> df
S1 S2 S3 S4 Diff
1 NA NA 3 4 1
2 1 3 NA NA 2
3 3 NA 5 NA 2
Data
> dput(df)
structure(list(S1 = c(NA, 1L, 3L), S2 = c(NA, 3L, NA), S3 = c(3L,
NA, 5L), S4 = c(4L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
When with timestamp values, maybe you can try the code below
df$Diff <- apply(df,1,function(x) Reduce("-",as.POSIXlt(rev(na.omit(x)))))
such that
> df
S1 S2 S3 S4 Diff
1 NA 2020-07-08 07:08:31 NA 2020-07-08 07:46:36 38.083333
2 NA 2020-07-08 07:08:53 NA 2020-07-08 07:11:57 3.066667
3 NA 2020-07-08 07:09:40 NA 2020-07-08 07:12:50 3.166667
4 NA 2020-07-08 07:09:42 NA 2020-07-08 07:13:14 3.533333
5 NA 2020-07-08 07:10:36 NA 2020-07-08 07:13:46 3.166667
6 NA 2020-07-08 07:11:42 NA 2020-07-08 07:44:45 33.050000
Data
> dput(df)
structure(list(S1 = c(NA, NA, NA, NA, NA, NA), S2 = c("2020-07-08 07:08:31",
"2020-07-08 07:08:53", "2020-07-08 07:09:40", "2020-07-08 07:09:42",
"2020-07-08 07:10:36", "2020-07-08 07:11:42"), S3 = c(NA, NA,
NA, NA, NA, NA), S4 = c("2020-07-08 07:46:36", "2020-07-08 07:11:57",
"2020-07-08 07:12:50", "2020-07-08 07:13:14", "2020-07-08 07:13:46",
"2020-07-08 07:44:45")), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")
Upvotes: 2