how to fill in blank rows and combine split lines in r

Question

I have a data frame from pdf and some texts that are supposed to be in one line, now spanning over different number of lines like this:

df_missing = data.frame(group = c("East","","","West","","",""), 
                        order = c("this","is supposed to be","one line","this","is supposed to be","one line","too"))

How could I correct the data frame to collapse the split lines

df_correct = data.frame(group = c("East","West"), order = c("this is supposed to be one line", "this is supposed to be one line too"))

akrun · Accepted Answer

We can do this in multiple ways. One way is to create a group by taking the cumulative sum of logical vector based on non-blank elements in 'group' and summarise the 'order' by pasteing the elements together

library(dplyr)
df_missing  %>%
      group_by(group1 = cumsum(group != "")) %>% 
      summarise(group = first(group), order = paste(order, collapse= ' ')) %>% 
      select(-group1)
# A tibble: 2 x 2
#  group order                              
#                                 
#1 East  this is supposed to be one line    
#2 West  this is supposed to be one line too

Or instead of creating a new grouping column, use the cumsum as index to fill the unique non-blank elements in 'group'

df_missing %>%
     group_by(group = unique(group[group!=""])[cumsum(group != "")])  %>% 
     summarise(order = paste(order, collapse=' '))

Another option is to change the blank to NA, then fill it with non-NA preceding values, grouped by 'group', paste the 'order as above

library(tidyr)
df_missing %>%
     mutate(group = replace(group, group == '', NA)) %>% 
     fill(group) %>% 
     group_by(group) %>%
     summarise(order = paste(order, collapse= ' '))

how to fill in blank rows and combine split lines in r

Answers (2)

Related Questions