1qazzaq1
1qazzaq1

Reputation: 3

Fill out blank rows within a column in R

We currently have made a column called Change with 0 at certain rows, indicating the change of a new manager within a firm .

Firm Manager Quarter Change
1 1 31.12.2018
1 1 31.03.2019
1 1 30.06.2019
1 1 30.09.2019
1 2 31.12.2019 0
1 2 31.03.2020
1 2 30.06.2020
1 2 30.09.2020

Is there some type of function in R that can fill out the blank rows, as depicted in the table below? The length of how long the different manager works within a certain firm varies.

Firm Manager Quarter Eventdate
1 1 31.12.2018 -4
1 1 31.03.2019 -3
1 1 30.06.2019 -2
1 1 30.09.2019 -1
1 2 31.12.2019 0
1 2 31.03.2020 1
1 2 30.06.2020 2
1 2 30.09.2020 3

Apologies for the poor explanation. Any help would be extremely appreciated.

Upvotes: 0

Views: 67

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146164

Using Wimpel's nicely copy/pasteable sample data, here are dplyr and data.table solutions based on subtracting row numbers by group. It should work when there is a single row where Change is 0 per Firm.

DT[, Eventdate := .I - which(Change == 0), by = .(Firm)]
DT
#    Firm Manager    Quarter Change Eventdate
# 1:    1       1 31.12.2018     NA        -4
# 2:    1       1 31.03.2019     NA        -3
# 3:    1       1 30.06.2019     NA        -2
# 4:    1       1 30.09.2019     NA        -1
# 5:    1       2 31.12.2019      0         0
# 6:    1       2 31.03.2020     NA         1
# 7:    1       2 30.06.2020     NA         2
# 8:    1       2 30.09.2020     NA         3

library(dplyr)
DT %>% group_by(Firm) %>%
  mutate(Eventdate = row_number() - which(Change == 0))
# # A tibble: 8 x 5
# # Groups:   Firm [1]
#    Firm Manager Quarter    Change Eventdate
#   <int>   <int> <chr>       <int>     <int>
# 1     1       1 31.12.2018     NA        -4
# 2     1       1 31.03.2019     NA        -3
# 3     1       1 30.06.2019     NA        -2
# 4     1       1 30.09.2019     NA        -1
# 5     1       2 31.12.2019      0         0
# 6     1       2 31.03.2020     NA         1
# 7     1       2 30.06.2020     NA         2
# 8     1       2 30.09.2020     NA         3

Upvotes: 1

Wimpel
Wimpel

Reputation: 27792

Here is a data.table approach using a rolling join.. It seems to work fine on the sample data provided. change2 is the column you are looking for.

library( data.table )
DT <- fread("Firm   Manager     Quarter     Change
1   1   31.12.2018  
1   1   31.03.2019  
1   1   30.06.2019  
1   1   30.09.2019  
1   2   31.12.2019  0
1   2   31.03.2020  
1   2   30.06.2020  
1   2   30.09.2020  ")

#add id column
DT[, id := .I ]
#set key to join on
setkey( DT, id )
#perform rolling join to nearest Change == 0, get id and subtract
DT[, change2 := id - DT[Change == 0, ][DT, x.id, roll = "nearest" ] ]

#    Firm Manager    Quarter Change id change2
# 1:    1       1 31.12.2018     NA  1      -4
# 2:    1       1 31.03.2019     NA  2      -3
# 3:    1       1 30.06.2019     NA  3      -2
# 4:    1       1 30.09.2019     NA  4      -1
# 5:    1       2 31.12.2019      0  5       0
# 6:    1       2 31.03.2020     NA  6       1
# 7:    1       2 30.06.2020     NA  7       2
# 8:    1       2 30.09.2020     NA  8       3

I left in the temporary columns, so you can see what is going on here.

Upvotes: 1

Related Questions