fsure
fsure

Reputation: 335

Create columns based on multiple other columns in R

I am trying to create multiple other columns based on three columns in my data (Company, Name, and Year).

I want to create multiple columns which give me the output as displayed in the table below. The following are the criteria for each column I want to create.

Number_Years: Number of years a person has worked with the company. 7years_Span: Number of years a person has worked with the company in the last 7 years (from the current year). Continuous_Years: Number of continuous years a person has worked with the company. Years_Gap: Number of years between that last time Person worked with the company and joining the company again.

Based on the columns Company, Name and Year: I want to create other columns as defined above

Company Name Year Number_Years 7years_Span Continuous_Years Years_Gap
ABC John 2002 7 1 3 0
ABC John 2003 7 2 3 0
ABC John 2004 7 3 3 0
ABC Dave 2005 2 1 1 0
ABC John 2006 7 4 1 1
ABC Dave 2007 2 2 1 1
ABC John 2008 7 5 2 1
ABC John 2009 7 5 2 0
BBC Jim 2010 1 1 1 0
ABC Jim 2010 2 1 2 0
BBC Dave 2011 1 1 1 0
BBB John 2011 1 1 1 0
ABC Jim 2011 2 2 2 0
ABC John 2012 7 4 1 2

I have tried to create these columns but I have gotten nowhere near so far and I have searched on StackOverflow and other platforms with no luck. Would appreciate it if anyone can help. I hope I have provided a simple example with enough information.

Example data dput:

structure(list(Company = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 3L, 1L, 3L, 2L, 1L, 1L), .Label = c("ABC", "BBB", "BBC"
), class = "factor"), Name = structure(c(3L, 3L, 3L, 1L, 3L, 
1L, 3L, 3L, 2L, 2L, 1L, 3L, 2L, 3L), .Label = c("Dave", "Jim", 
"John"), class = "factor"), Year = c(2002L, 2003L, 2004L, 2005L, 
2006L, 2007L, 2008L, 2009L, 2010L, 2010L, 2011L, 2011L, 2011L, 
2012L), Number_Years = c(7L, 7L, 7L, 2L, 7L, 2L, 7L, 7L, 1L, 
2L, 1L, 1L, 2L, 7L), `7years_span` = c(1L, 2L, 3L, 1L, 4L, 2L, 
5L, 5L, 1L, 1L, 1L, 1L, 2L, 4L), Continuous_Years = c(3L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L), Years_Gap = c(0L, 
0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, 
-14L))

Upvotes: 0

Views: 87

Answers (1)

AnilGoyal
AnilGoyal

Reputation: 26218

Edited

df <- structure(list(Company = c("ABC", "ABC", "ABC", "ABC", "ABC", 
                                 "ABC", "ABC", "ABC", "BBC", "ABC", "BBC", "BBB", "ABC", "ABC"
), Name = c("John", "John", "John", "Dave", "John", "Dave", "John", 
            "John", "Jim", "Jim", "Dave", "John", "Jim", "John"), Year = c(2002L, 
                                                                           2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2010L, 
                                                                           2011L, 2011L, 2011L, 2012L)), row.names = c(NA, -14L), class = "data.frame")
df
#>    Company Name Year
#> 1      ABC John 2002
#> 2      ABC John 2003
#> 3      ABC John 2004
#> 4      ABC Dave 2005
#> 5      ABC John 2006
#> 6      ABC Dave 2007
#> 7      ABC John 2008
#> 8      ABC John 2009
#> 9      BBC  Jim 2010
#> 10     ABC  Jim 2010
#> 11     BBC Dave 2011
#> 12     BBB John 2011
#> 13     ABC  Jim 2011
#> 14     ABC John 2012
library(runner)
library(dplyr)

df %>%
  group_by(Company, Name) %>%
  mutate(Number_years = n(),
         X7Years_span = runner(x = Year,
                               k = 7,
                               idx = Year,
                               f = function(x) length(x)),
         Continous_years = cumsum(c(0, diff(Year)) != 1),
         Years_gap = Year - lag(Year, default = first(Year) -1) -1) %>%
  group_by(Continous_years, .add = T) %>%
  mutate(Continous_years = n()) %>%
  ungroup()
#> # A tibble: 14 x 7
#>    Company Name   Year Number_years X7Years_span Continous_years Years_gap
#>    <chr>   <chr> <int>        <int>        <int>           <int>     <dbl>
#>  1 ABC     John   2002            7            1               3         0
#>  2 ABC     John   2003            7            2               3         0
#>  3 ABC     John   2004            7            3               3         0
#>  4 ABC     Dave   2005            2            1               1         0
#>  5 ABC     John   2006            7            4               1         1
#>  6 ABC     Dave   2007            2            2               1         1
#>  7 ABC     John   2008            7            5               2         1
#>  8 ABC     John   2009            7            5               2         0
#>  9 BBC     Jim    2010            1            1               1         0
#> 10 ABC     Jim    2010            2            1               2         0
#> 11 BBC     Dave   2011            1            1               1         0
#> 12 BBB     John   2011            1            1               1         0
#> 13 ABC     Jim    2011            2            2               2         0
#> 14 ABC     John   2012            7            4               1         2

Created on 2021-05-25 by the reprex package (v2.0.0)

Upvotes: 1

Related Questions