Reputation: 3427
I have a data frame from pdf and some texts that are supposed to be in one line, now spanning over different number of lines like this:
df_missing = data.frame(group = c("East","","","West","","",""),
order = c("this","is supposed to be","one line","this","is supposed to be","one line","too"))
How could I correct the data frame to collapse the split lines
df_correct = data.frame(group = c("East","West"), order = c("this is supposed to be one line", "this is supposed to be one line too"))
Upvotes: 1
Views: 113
Reputation: 11480
Similar concepts like @akrun
data.table solution:
library(data.table)
setDT(df_missing)[,.(group=group[1], order = paste(order, collapse= ' ')),by=cumsum(group != "")][,-1]
# group order
#1: East this is supposed to be one line
#2: West this is supposed to be one line too
Upvotes: 0
Reputation: 887128
We can do this in multiple ways. One way is to create a group by taking the cumulative sum of logical vector based on non-blank elements in 'group' and summarise
the 'order' by paste
ing the elements together
library(dplyr)
df_missing %>%
group_by(group1 = cumsum(group != "")) %>%
summarise(group = first(group), order = paste(order, collapse= ' ')) %>%
select(-group1)
# A tibble: 2 x 2
# group order
# <fct> <chr>
#1 East this is supposed to be one line
#2 West this is supposed to be one line too
Or instead of creating a new grouping column, use the cumsum
as index to fill the unique
non-blank elements in 'group'
df_missing %>%
group_by(group = unique(group[group!=""])[cumsum(group != "")]) %>%
summarise(order = paste(order, collapse=' '))
Another option is to change the blank to NA
, then fill
it with non-NA preceding values, grouped by 'group', paste
the 'order as above
library(tidyr)
df_missing %>%
mutate(group = replace(group, group == '', NA)) %>%
fill(group) %>%
group_by(group) %>%
summarise(order = paste(order, collapse= ' '))
Upvotes: 1