Reputation: 135
I have a dataframe that is structured as follows:
structure(list(CT_CW.QA.RWL.H1A1Y = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A1Z = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
CT_CW.QA.RWL.H1A2Y = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A2Z = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Y = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A3Z = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Y = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A4Z = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Y = c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_
), CT_CW.QA.RWL.H1A5Z = c(1.07, 0.41, 0.87, 1.21, 0.99, 0.77,
0.73, 0.77, 0.61, 0.89), CT_CW.QA.RWL.H1A6Y = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A6Z = c(NA,
NA, 0.92, 0.64, 0.63, 0.48, 0.17, 0.28, 0.32, 0.64), CT_CW.QA.RWL.H1A7Y = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A7Z = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), CT_CW.QA.RWL.H1A8Y = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), row.names = c("1812", "1813",
"1814", "1815", "1816", "1817", "1818", "1819", "1820", "1821"
), class = "data.frame")
What I would like to do is, for each column, get the cumulative count of the number of rows excluding NAs (it's necessary to leave the NAs in at this point).
I've tried the following (where test
is the dataframe above):
test_count = cumsum(colSums(!is.na(test)))
But this seems to be continuing the count across columns, whereas I need a unique cumulative count of the number of rows in each column, so that the result is a dataframe that looks like (this is for visual reference only, the numbers are made up):
Row.Name CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A2Y
1812 0 0
1813 0 1
1814 1 2
...indicating that the original dataframe (test
) had NAs in the first two rows of columns CT_CW.QA.RWL.H1A1Y, and an NA in the first row of the column CT.CW.QA.RWL.H1A2Y (again, this hypothetical doesn't represent the values in the data above, it's just to illustrate the structure of what I'm looking for)
Upvotes: 1
Views: 109
Reputation: 887311
We can loop over the column with lapply
, convert to logical, do the cumsum
and assign the output back to the original or a copy of the original object. Make sure to use []
to preserve the attributes
test1 <- test
test1[] <- lapply(test, function(x) cumsum(!is.na(x)))
-output
head(test1)
CT_CW.QA.RWL.H1A1Y CT_CW.QA.RWL.H1A1Z CT_CW.QA.RWL.H1A2Y CT_CW.QA.RWL.H1A2Z CT_CW.QA.RWL.H1A3Y CT_CW.QA.RWL.H1A3Z CT_CW.QA.RWL.H1A4Y
1812 0 0 0 0 0 0 0
1813 0 0 0 0 0 0 0
1814 0 0 0 0 0 0 0
1815 0 0 0 0 0 0 0
1816 0 0 0 0 0 0 0
1817 0 0 0 0 0 0 0
CT_CW.QA.RWL.H1A4Z CT_CW.QA.RWL.H1A5Y CT_CW.QA.RWL.H1A5Z CT_CW.QA.RWL.H1A6Y CT_CW.QA.RWL.H1A6Z CT_CW.QA.RWL.H1A7Y CT_CW.QA.RWL.H1A7Z
1812 0 0 1 0 0 0 0
1813 0 0 2 0 0 0 0
1814 0 0 3 0 1 0 0
1815 0 0 4 0 2 0 0
1816 0 0 5 0 3 0 0
1817 0 0 6 0 4 0 0
CT_CW.QA.RWL.H1A8Y
1812 0
1813 0
1814 0
1815 0
1816 0
1817 0
....
There is a colCumsums
from matrixStats
library(matrixStats)
test1[] <- colCumsums(!is.na(test))
The issue with cumsum
returning a vector is because colSums
is a single vector
of one observation per each column, and cumsum
on it returns the cumulative sum of column sums instead of cumulative sum inside each column
Upvotes: 1