Reputation: 335
I am trying to create multiple other columns based on three columns in my data (Company
, Name
, and Year
).
I want to create multiple columns which give me the output as displayed in the table below. The following are the criteria for each column I want to create.
Number_Years:
Number of years a person has worked with the company.
7years_Span:
Number of years a person has worked with the company in the last 7 years (from the current year).
Continuous_Years:
Number of continuous years a person has worked with the company.
Years_Gap:
Number of years between that last time Person worked with the company and joining the company again.
Based on the columns Company
, Name
and Year
: I want to create other columns as defined above
Company | Name | Year | Number_Years | 7years_Span | Continuous_Years | Years_Gap |
---|---|---|---|---|---|---|
ABC | John | 2002 | 7 | 1 | 3 | 0 |
ABC | John | 2003 | 7 | 2 | 3 | 0 |
ABC | John | 2004 | 7 | 3 | 3 | 0 |
ABC | Dave | 2005 | 2 | 1 | 1 | 0 |
ABC | John | 2006 | 7 | 4 | 1 | 1 |
ABC | Dave | 2007 | 2 | 2 | 1 | 1 |
ABC | John | 2008 | 7 | 5 | 2 | 1 |
ABC | John | 2009 | 7 | 5 | 2 | 0 |
BBC | Jim | 2010 | 1 | 1 | 1 | 0 |
ABC | Jim | 2010 | 2 | 1 | 2 | 0 |
BBC | Dave | 2011 | 1 | 1 | 1 | 0 |
BBB | John | 2011 | 1 | 1 | 1 | 0 |
ABC | Jim | 2011 | 2 | 2 | 2 | 0 |
ABC | John | 2012 | 7 | 4 | 1 | 2 |
I have tried to create these columns but I have gotten nowhere near so far and I have searched on StackOverflow and other platforms with no luck. Would appreciate it if anyone can help. I hope I have provided a simple example with enough information.
Example data dput:
structure(list(Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("ABC", "BBB", "BBC"
), class = "factor"), Name = structure(c(3L, 3L, 3L, 1L, 3L,
1L, 3L, 3L, 2L, 2L, 1L, 3L, 2L, 3L), .Label = c("Dave", "Jim",
"John"), class = "factor"), Year = c(2002L, 2003L, 2004L, 2005L,
2006L, 2007L, 2008L, 2009L, 2010L, 2010L, 2011L, 2011L, 2011L,
2012L), Number_Years = c(7L, 7L, 7L, 2L, 7L, 2L, 7L, 7L, 1L,
2L, 1L, 1L, 2L, 7L), `7years_span` = c(1L, 2L, 3L, 1L, 4L, 2L,
5L, 5L, 1L, 1L, 1L, 1L, 2L, 4L), Continuous_Years = c(3L, 3L,
3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L), Years_Gap = c(0L,
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA,
-14L))
Upvotes: 0
Views: 87
Reputation: 26218
Edited
df <- structure(list(Company = c("ABC", "ABC", "ABC", "ABC", "ABC",
"ABC", "ABC", "ABC", "BBC", "ABC", "BBC", "BBB", "ABC", "ABC"
), Name = c("John", "John", "John", "Dave", "John", "Dave", "John",
"John", "Jim", "Jim", "Dave", "John", "Jim", "John"), Year = c(2002L,
2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2010L,
2011L, 2011L, 2011L, 2012L)), row.names = c(NA, -14L), class = "data.frame")
df
#> Company Name Year
#> 1 ABC John 2002
#> 2 ABC John 2003
#> 3 ABC John 2004
#> 4 ABC Dave 2005
#> 5 ABC John 2006
#> 6 ABC Dave 2007
#> 7 ABC John 2008
#> 8 ABC John 2009
#> 9 BBC Jim 2010
#> 10 ABC Jim 2010
#> 11 BBC Dave 2011
#> 12 BBB John 2011
#> 13 ABC Jim 2011
#> 14 ABC John 2012
library(runner)
library(dplyr)
df %>%
group_by(Company, Name) %>%
mutate(Number_years = n(),
X7Years_span = runner(x = Year,
k = 7,
idx = Year,
f = function(x) length(x)),
Continous_years = cumsum(c(0, diff(Year)) != 1),
Years_gap = Year - lag(Year, default = first(Year) -1) -1) %>%
group_by(Continous_years, .add = T) %>%
mutate(Continous_years = n()) %>%
ungroup()
#> # A tibble: 14 x 7
#> Company Name Year Number_years X7Years_span Continous_years Years_gap
#> <chr> <chr> <int> <int> <int> <int> <dbl>
#> 1 ABC John 2002 7 1 3 0
#> 2 ABC John 2003 7 2 3 0
#> 3 ABC John 2004 7 3 3 0
#> 4 ABC Dave 2005 2 1 1 0
#> 5 ABC John 2006 7 4 1 1
#> 6 ABC Dave 2007 2 2 1 1
#> 7 ABC John 2008 7 5 2 1
#> 8 ABC John 2009 7 5 2 0
#> 9 BBC Jim 2010 1 1 1 0
#> 10 ABC Jim 2010 2 1 2 0
#> 11 BBC Dave 2011 1 1 1 0
#> 12 BBB John 2011 1 1 1 0
#> 13 ABC Jim 2011 2 2 2 0
#> 14 ABC John 2012 7 4 1 2
Created on 2021-05-25 by the reprex package (v2.0.0)
Upvotes: 1