Reputation: 51
In R, I have a dataframe that looks like this:
2021 2020 2019 2018 2017 2015 2010 2006 2002 1998 1994 1990
1 6 6 6 6 4 6 6 6 6 6 6 6
2 6 6 6 6 6 6 6 6 6 3 4 4
3 7 6 6 6 4 6 6 6 6 6 6 6
4 6 6 6 6 6 6 6 6 6 6 6 6
5 4 4 7 6 4 6 6 6 6 6 4 6
6 6 6 6 6 6 4 6 6 6 2 6 6
...
What I want to do is count how many consecutive values there are from the first column (including the first column), and put the result in a new dataframe called 'age'. So the desired output from the above example would look like this:
age
1 4
2 9
3 1
4 12
5 2
6 5
...
I've been playing around with rle
, but have had no luck so far. Any ideas?
EDIT to add reproducible data:
df <- structure(list(LC_2021 = c(6, 6, 6, 6, 6, 6), LC_2020 = c(6,
6, 6, 6, 4, 6), LC_2019 = c(6, 6, 6, 6, 6, 6), LC_2018 = c(6,
6, 6, 6, 6, 6), LC_2017 = c(6, 6, 6, 6, 6, 6), LC_2015 = c(6,
6, 6, 6, 6, 6), LC_2010 = c(6, 6, 6, 6, 6, 6), LC_2006 = c(4,
4, 4, 6, 4, 4), LC_2002 = c(6, 6, 6, 6, 6, 6), LC_1998 = c(6,
6, 6, 6, 6, 6), LC_1994 = c(6, 6, 6, 6, 6, 6), LC_1990 = c(6,
6, 6, 6, 6, 6)), row.names = c(NA, 6L), class = "data.frame")
Upvotes: 1
Views: 202
Reputation: 269654
Assuming that -Inf is not in DF
we can use max.col
. We create a temporary data frame that has a column of -Inf's to catch any row that is entirely constant. No apply
functions or packages are used.
data.frame(age = max.col(cbind(DF, -Inf) != DF[[1]], "first") - 1)
## age
## 1 4
## 2 9
## 3 1
## 4 12
## 5 2
## 6 5
Lines <- "2021 2020 2019 2018 2017 2015 2010 2006 2002 1998 1994 1990
1 6 6 6 6 4 6 6 6 6 6 6 6
2 6 6 6 6 6 6 6 6 6 3 4 4
3 7 6 6 6 4 6 6 6 6 6 6 6
4 6 6 6 6 6 6 6 6 6 6 6 6
5 4 4 7 6 4 6 6 6 6 6 4 6
6 6 6 6 6 6 4 6 6 6 2 6 6"
DF <- read.table(text = Lines, check.names = FALSE)
Upvotes: 2
Reputation: 52004
With cummin
:
data.frame(age = apply(df, 1, \(x) sum(cummin(x == x[1]))))
# age
# 1 4
# 2 9
# 3 1
# 4 12
# 5 2
# 6 5
Upvotes: 5
Reputation: 19097
You can use apply
to iterate through the rows, then extract the lengths from rle
.
data.frame(age = apply(df, 1, \(x) rle(x)$lengths[[1]]))
age
1 4
2 9
3 1
4 12
5 2
6 5
df <- structure(list(X2021 = c(6L, 6L, 7L, 6L, 4L, 6L), X2020 = c(6L,
6L, 6L, 6L, 4L, 6L), X2019 = c(6L, 6L, 6L, 6L, 7L, 6L), X2018 = c(6L,
6L, 6L, 6L, 6L, 6L), X2017 = c(4L, 6L, 4L, 6L, 4L, 6L), X2015 = c(6L,
6L, 6L, 6L, 6L, 4L), X2010 = c(6L, 6L, 6L, 6L, 6L, 6L), X2006 = c(6L,
6L, 6L, 6L, 6L, 6L), X2002 = c(6L, 6L, 6L, 6L, 6L, 6L), X1998 = c(6L,
3L, 6L, 6L, 6L, 2L), X1994 = c(6L, 4L, 6L, 6L, 4L, 6L), X1990 = c(6L,
4L, 6L, 6L, 6L, 6L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
Upvotes: 6