Reputation: 71
I am stuck in transforming an existing data frame in R, using dplyr (but open for other options. I am running out of ideas and nothing brings me closer to the required result. The data frame looks like that:
data.frame("group" = c('a', 'a', 'b', 'b', 'c', 'c' ), "condition"= c(1, 2, 1, 2, 1,2 ), "X1" = c(2010,'x', 2011, 'x', 2010, 'x'), "X2" = c(2011,'x', 2012, 'x', 2011, 'x'), "X3" = c(2012,'x', 2013, 'x', 2012, 'x'), "X4" = c(2013,'x', 2014, 'x', 2013, 'x'), "X5" = c(2014,'', 2015, 'x', 2014, 'x'), "X6" = c(2015,'', 2015, '', 2015, ''))
For each group, the new data frame should show the earliest and last year (condition 1) that contains an 'x' in condition 2.
The result should look like:
data.frame("group" = c('a', 'b', 'c' ), "min"= c(2010, 2011, 2010), "max" = c(2013, 2015, 2014))
Upvotes: 3
Views: 1623
Reputation: 30559
With tidyverse
you can try the following approach. First, put your data into long form targeting your year columns. Then, group_by
both group and name (which contains the year) and only include subgroups that have a value
of x
, and keep rows that have condition
of 1. Then group_by
just group
and summarise
to get the min
and max
years. Note, you may wish to convert your year data to numeric after removing x
by filtering on condition
.
library(tidyverse)
df1 %>%
pivot_longer(cols = -c(group, condition)) %>%
group_by(group, name) %>%
filter(any(value == "x"), condition == 1) %>%
group_by(group) %>%
summarise(min = min(value),
max = max(value))
Output
# A tibble: 3 x 3
group min max
<chr> <chr> <chr>
1 a 2010 2013
2 b 2011 2015
3 c 2010 2014
Upvotes: 2
Reputation: 4358
in Base-R
results <- df[df$condition==1,1:2]
results <- cbind(results, t(apply(df[df$condition==1,3:ncol(df)],1,function(x)c(Min=min(x),Max=max(x)))))
group condition Min Max
1 a 1 2010 2015
3 b 1 2011 2015
5 c 1 2010 2015
Upvotes: 1