Geet
Geet

Reputation: 2575

Fill a column using range in other columns

Here is my data frame. For all the page_nums between section_start and section_end, I want to create fill_section_num with section_num values. For example, if the page_nums is between 5 through 6, the section_num is 2. So, in the fill_section_num, I want 2 for 5th and 6th row and so on.

df <- tribble(
  ~page_nums, ~section_start, ~section_end, ~section_num, ~fill_section_num,
       1,      NA_integer_,    NA_integer_,  NA_integer_,   NA_integer_,
       2,      NA_integer_,    NA_integer_,  NA_integer_,      1,
       3,           2,              4,            1,           1,
       4,           5,              6,            2,           1,
       5,           7,              7,            3,           2,
       6,           8,              10,           4,           2,
       7,      NA_integer_,     NA_integer_,  NA_integer_,     3,
       8,      NA_integer_,     NA_integer_,  NA_integer_,     4,
       9,      NA_integer_,     NA_integer_,  NA_integer_,     4,
       10,     NA_integer_,     NA_integer_,  NA_integer_,     4)

I was planing to do something like this, but failed:

 df <- df %>% 
          mutate(fill_section_num = if_else(between(page_nums, section_start, section_end), section_num, NA_real_))

The desired output is the fill_section_num column calculated using the R code.

Upvotes: 0

Views: 63

Answers (2)

Onyambu
Onyambu

Reputation: 79228

You can expand the section start:section_end, then unnest and finally join the table to itself:

df%>%
  mutate(nn=map2(section_start,section_end,
                 ~if(any(is.na(c(.x,.y)))) NA else .x:.y))%>%
  unnest()%>%
  select(nn,section_num)%>%
  right_join(df,c(nn="page_nums"))%>%
  select(-section_num.x,everything())
# A tibble: 10 x 6
      nn section_start section_end section_num.y fill_section_num section_num.x
   <dbl>         <dbl>       <dbl>         <dbl>            <dbl>         <dbl>
 1     1            NA          NA            NA               NA            NA
 2     2            NA          NA            NA                1             1
 3     3             2           4             1                1             1
 4     4             5           6             2                1             1
 5     5             7           7             3                2             2
 6     6             8          10             4                2             2
 7     7            NA          NA            NA                3             3
 8     8            NA          NA            NA                4             4
 9     9            NA          NA            NA                4             4
10    10            NA          NA            NA                4             4

The last column above is what you are looking for.

Upvotes: 1

DanY
DanY

Reputation: 6073

One way to proceed:

# example data (with simpler varnames because I'm lazy):
df <- data.frame(
    pages = 3:6,
    start = c(2,5,7,8),
    end   = c(4,6,7,10),
    section = 1:4
)

# create two-column dataframe of all page numbers and the fill_section they belong to
myfun <- function(x) {
    cbind( pages = seq(x[1], x[2]), 
           full_section = rep(x[3], times=x[2]-x[1]+1) )
}

temp_list <- apply(t(df[,2:4]), 2, myfun)
temp_df   <- data.frame(do.call(rbind, temp_list))

# Merge this fill_section information back onto the original dataframe:
result <- merge(df, temp_df, by="pages")

Check that the result makes sense:

result
  pages start end section fill_section
      3     2   4       1            1
      4     5   6       2            1
      5     7   7       3            2
      6     8  10       4            2

Upvotes: 1

Related Questions