Sharif
Sharif

Reputation: 391

How to open new rows based on 2 date columns in r

I have the following sample dataset of a large dataset -

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole
 US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09
 US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24
 US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07
 US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30
 US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01
 US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21

My QUESTION is - I want to create New Rows based on the variables ROLE_START and ROLE_END. The number of rows to be created depends the MINIMUM ROLE_START and MAXIMUM ROLE_END assuming that the data is grouped by isin and directorid. For example - for isin US6819771048 and directorid 340769, the MINIMUM ROLE_START year is 1995 and MAXIMUM ROLE_END year is 2007. So I need to open rows for each year of 1995-2007 and theses years should be stored in YEAR variable. Please note that in the above example there is no break between 1995-2007 because it is clear from ROLE_START and ROLE_END that all years are included. If there is any break between any years, then those breaks should be excluded. For the above sample dataset, my expected dataset should be look like this -

     isin       directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole YEAR
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1995
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1996
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1997
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1998
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   1999
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2000
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2001
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2002
US6819771048     340769      1970       1995     2003      M    1995-02-01  2003-01-09   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2003
US6819771048     340769      1970       2003     2004      M    2003-01-09  2004-02-24   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2004
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2005
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2006
US6819771048     340769      1970       2004     2007      M    2004-02-24  2007-09-07   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1986
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1987
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1988
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1989
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1990
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1991
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1992
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1993
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1994
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1995
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1996
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1997
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1998
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   1999
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2000
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2001
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2002
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2003
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2004
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2005
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2006
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2007
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2008
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2009
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2010
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2011
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2012
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2013
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2014
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2015
US68243Q1067      86917      1951       1986     2016      M    1986-01-01  2016-06-30   2016
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1976
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1977
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1978
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1979
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1980
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1981
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1982
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1983
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1984
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1985
US68243Q1067      86917      1951       1976     1986      M    1976-04-01  1986-01-01   1986
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2016
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2017
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2018
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2019
US68243Q1067     327069      1961       2016     2020      M    2016-06-30  2020-05-21   2020

Upvotes: 0

Views: 27

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

You can create a sequence between ROLE_START and ROLE_END and get data in different rows.

library(dplyr)
df %>%
  mutate(YEAR = purrr::map2(ROLE_START, ROLE_END, seq)) %>%
  tidyr::unnest(YEAR)


#   isin         directorid dob_Year2 ROLE_START ROLE_END gender datestartrole dateendrole  YEAR
#   <chr>             <int>     <int>      <int>    <int> <chr>  <chr>         <chr>       <int>
# 1 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1995
# 2 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1996
# 3 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1997
# 4 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1998
# 5 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   1999
# 6 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2000
# 7 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2001
# 8 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2002
# 9 US6819771048     340769      1970       1995     2003 M      1995-02-01    2003-01-09   2003
#10 US6819771048     340769      1970       2003     2004 M      2003-01-09    2004-02-24   2003
# … with 52 more rows

Upvotes: 1

Related Questions