Reputation: 53
I have a dataset that has a variable ColumnStart that identifies the first column to calculate an average. I have a second variable ColumnEnd that identifies the last column in that calculation. For the first row I would like to calculate the average of the values from column 5 to column 9. the second row from column 6 to 11, etc.
The output would be:
Here is the updated dput from R:
structure(list(ID = c("AAA", "BBB", "CCC", "DDD"), ShortID = c("452L",
"3L", "4L", "324L"), Name = c("PS1", "PS2", "PS3", "PS4"), Route =
c("Internal",
"External", "Internal", "Internal"), ColumnStart = c(7L, 7L,
9L, 8L), ColumnEnd = c(9L, 11L, 13L, 10L), Date1 = c(1L, 5L,
13L, 4L), Date2 = c(2L, 6L, 45L, 3L), Date3 = c(3L, 7L, 23L,
2L), Date4 = c(4L, 8L, 65L, 1L), Date5 = c(5L, 8L, 34L, 3L),
Date6 = c(6L, 9L, 23L, 5L), Date7 = c(7L, 6L, 54L, 6L), Date8 = c(7L,
6L, 1L, 7L), Date9 = c(8L, 9L, 3L, 8L)), .Names = c("ID",
"ShortID", "Name", "Route", "ColumnStart", "ColumnEnd", "Date1",
"Date2", "Date3", "Date4", "Date5", "Date6", "Date7", "Date8",
"Date9"), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"), spec = structure(list(cols = structure(list(ID =
structure(list(), class = c("collector_character",
"collector")), ShortID = structure(list(), class =
c("collector_character",
"collector")), Name = structure(list(), class = c("collector_character",
"collector")), Route = structure(list(), class = c("collector_character",
"collector")), ColumnStart = structure(list(), class =
c("collector_integer",
"collector")), ColumnEnd = structure(list(), class =
c("collector_integer",
"collector")), Date1 = structure(list(), class = c("collector_integer",
"collector")), Date2 = structure(list(), class = c("collector_integer",
"collector")), Date3 = structure(list(), class = c("collector_integer",
"collector")), Date4 = structure(list(), class = c("collector_integer",
"collector")), Date5 = structure(list(), class = c("collector_integer",
"collector")), Date6 = structure(list(), class = c("collector_integer",
"collector")), Date7 = structure(list(), class = c("collector_integer",
"collector")), Date8 = structure(list(), class = c("collector_integer",
"collector")), Date9 = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("ID", "ShortID", "Name", "Route",
"ColumnStart", "ColumnEnd", "Date1", "Date2", "Date3", "Date4",
"Date5", "Date6", "Date7", "Date8", "Date9")), default = structure(list(),
class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Upvotes: 2
Views: 166
Reputation: 12165
Here's a base R solution that drops non-numeric columns before calculating the mean:
df$ave2 <- apply(df, 1, function(x) {
y <- as.numeric(x[seq.int(x['ColumnStart'], x['ColumnEnd'])])
mean(y[!is.na(y)])
})
df
# A tibble: 4 x 16
ID ShortID Name Route ColumnStart ColumnEnd Date1 Date2 Date3 Date4 Date5 Date6 Date7 Date8 Date9 Average
<chr> <chr> <chr> <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <dbl>
1 AAA 452L PS1 Internal 7 9 1 2 3 4 5 6 7 7 8 2
2 BBB 3L PS2 External 7 11 5 6 7 8 8 9 6 6 9 6.8
3 CCC 4L PS3 Internal 9 13 13 45 23 65 34 23 54 1 3 39.8
4 DDD 324L PS4 Internal 8 10 4 3 2 1 3 5 6 7 8 2
as.numeric
tries to convert the values to numeric
. If it cannot, then it returns NA
. We then drop NA
values and calculate the mean
.
Here's a one line version that works the same but uses na.omit
to strip out NA
values before calculating the mean:
df$Average <- apply(df, 1, function(x) mean(na.omit(as.numeric(x[seq.int(x['ColumnStart'], x['ColumnEnd'])]))))
Upvotes: 3
Reputation: 28705
Another method, not necessarily suggested
rowMeans(df*NA^!(col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd),
na.rm = T)
# [1] 3.000000 7.142857 5.000000 3.333333 6.500000
Explanation:
col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd
is a matrix which is TRUE
at the (i, j) indices matching the ColumnStart
, ColumnEnd
specification
NA^!(col(df) >= df$ColumnStart & col(df) <= df$ColumnEnd)
is a matrix which is 1
at the places where the above matrix is TRUE
and NA
everywhere else. Mutiplying this by df
gives a matrix which is the same as df
except all elements whose indices don't match the ColumnStart
and ColumnEnd
specification are NA
Now we can just take the rowMeans
of that, with na.rm = T
to get the desired result
Upvotes: 1