santoku
santoku

Reputation: 3427

how to fill in blank rows and combine split lines in r

I have a data frame from pdf and some texts that are supposed to be in one line, now spanning over different number of lines like this:

df_missing = data.frame(group = c("East","","","West","","",""), 
                        order = c("this","is supposed to be","one line","this","is supposed to be","one line","too"))

How could I correct the data frame to collapse the split lines

df_correct = data.frame(group = c("East","West"), order = c("this is supposed to be one line", "this is supposed to be one line too"))

Upvotes: 1

Views: 113

Answers (2)

Andre Elrico
Andre Elrico

Reputation: 11480

Similar concepts like @akrun

data.table solution:

library(data.table)
setDT(df_missing)[,.(group=group[1], order = paste(order, collapse= ' ')),by=cumsum(group != "")][,-1]

#   group                               order
#1:  East     this is supposed to be one line
#2:  West this is supposed to be one line too

Upvotes: 0

akrun
akrun

Reputation: 887128

We can do this in multiple ways. One way is to create a group by taking the cumulative sum of logical vector based on non-blank elements in 'group' and summarise the 'order' by pasteing the elements together

library(dplyr)
df_missing  %>%
      group_by(group1 = cumsum(group != "")) %>% 
      summarise(group = first(group), order = paste(order, collapse= ' ')) %>% 
      select(-group1)
# A tibble: 2 x 2
#  group order                              
#  <fct> <chr>                              
#1 East  this is supposed to be one line    
#2 West  this is supposed to be one line too

Or instead of creating a new grouping column, use the cumsum as index to fill the unique non-blank elements in 'group'

df_missing %>%
     group_by(group = unique(group[group!=""])[cumsum(group != "")])  %>% 
     summarise(order = paste(order, collapse=' '))

Another option is to change the blank to NA, then fill it with non-NA preceding values, grouped by 'group', paste the 'order as above

library(tidyr)
df_missing %>%
     mutate(group = replace(group, group == '', NA)) %>% 
     fill(group) %>% 
     group_by(group) %>%
     summarise(order = paste(order, collapse= ' '))

Upvotes: 1

Related Questions