Reputation: 391
I have the following sample dataset of a large dataset -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21
My QUESTION is -
I want to create New Rows based on the variables ROLE_START
and ROLE_END
. The number of
rows to be created depends the MINIMUM ROLE_START
and MAXIMUM ROLE_END
assuming that the data is grouped by isin
and directorid
. For example - for isin
US6819771048 and directorid
340769, the MINIMUM ROLE_START
year is 1995 and MAXIMUM ROLE_END
year is 2007. So I need to open rows for each year of 1995-2007 and theses years should be stored in YEAR
variable. Please note that in the above example there is no break between 1995-2007 because it is clear from ROLE_START
and ROLE_END
that all years are included. If there is any break between any years, then those breaks should be excluded. For the above sample dataset, my expected dataset should be look like this -
isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2004
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2005
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2006
US6819771048 340769 1970 2004 2007 M 2004-02-24 2007-09-07 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1986
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1987
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1988
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1989
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1990
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1991
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1992
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1993
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1994
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1995
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1996
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1997
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1998
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 1999
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2000
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2001
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2002
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2003
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2004
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2005
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2006
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2007
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2008
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2009
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2010
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2011
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2012
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2013
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2014
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2015
US68243Q1067 86917 1951 1986 2016 M 1986-01-01 2016-06-30 2016
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1976
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1977
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1978
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1979
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1980
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1981
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1982
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1983
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1984
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1985
US68243Q1067 86917 1951 1976 1986 M 1976-04-01 1986-01-01 1986
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2016
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2017
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2018
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2019
US68243Q1067 327069 1961 2016 2020 M 2016-06-30 2020-05-21 2020
Upvotes: 0
Views: 27
Reputation: 388982
You can create a sequence between ROLE_START
and ROLE_END
and get data in different rows.
library(dplyr)
df %>%
mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
tidyr::unnest(YEAR)
# isin directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
# <chr> <int> <int> <int> <int> <chr> <chr> <chr> <int>
# 1 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1995
# 2 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1996
# 3 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1997
# 4 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1998
# 5 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 1999
# 6 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2000
# 7 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2001
# 8 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2002
# 9 US6819771048 340769 1970 1995 2003 M 1995-02-01 2003-01-09 2003
#10 US6819771048 340769 1970 2003 2004 M 2003-01-09 2004-02-24 2003
# … with 52 more rows
Upvotes: 1