Reputation: 53
I have extracted multiple tables from a PDF which contains strings over multiple lines. I have used the extract_table() function from the tabulizer package, the only problem being that the strings import as separate rows.
e.g.
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
data.frame(action, description)
action description
1 1 a
2 NA b
3 NA c
4 2 a
5 NA b
6 3 a
7 NA b
8 NA c
9 NA d
10 4 a
11 NA b
I would like to concatenate the strings so that they appear as the same element such as:
action description
1 1 a b c
2 2 a b
3 3 a b c d
4 4 a b
Hope that makes sense, appreciate any help!
Upvotes: 5
Views: 1447
Reputation: 887841
Here is one option with data.table
library(data.table)
setDT(df1)[, .(description = paste(description, collapse = ' ')),
.(action = cumsum(!is.na(action)))]
# action description
#1: 1 a b c
#2: 2 a b
#3: 3 a b c d
#4: 4 a b
Or using na.locf
from zoo
library(zoo)
setDT(df1)[, .(description = paste(description, collapse = ' ')),
.(action = na.locf(action))]
df1 <- data.frame(action, description)
Upvotes: 1
Reputation: 26373
A base R
option
dat <- data.frame(action, description)
aggregate(
description ~ action,
transform(dat, action = cumsum(!is.na(dat$action))),
FUN = paste,
... = collapse = " "
)
# action description
#1 1 a b c
#2 2 a b
#3 3 a b c d
#4 4 a b
For aggregate
to work we need to change action
to what is returned by cumsum(!is.na(dat$action)))
, i.e.
cumsum(!is.na(dat$action)))
#[1] 1 1 1 2 2 3 3 3 3 4 4
Upvotes: 1
Reputation: 389235
tidyverse
way would be to fill
the action
column with previous non-NA value then group_by
Action
and paste
the description
together.
library(tidyverse)
df %>%
fill(action) %>%
group_by(action) %>%
summarise(description = paste(description, collapse = " "))
# action description
# <dbl> <chr>
#1 1. a b c
#2 2. a b
#3 3. a b c d
#4 4. a b
Upvotes: 4
Reputation: 6567
You could use the zoo
and dplyr
packages like so
library(zoo)
library(dplyr)
action <- c(1, NA, NA, 2, NA, 3, NA, NA, NA, 4, NA)
description <- c("a", "b", "c", "a", "b", "a", "b", "c", "d", "a", "b")
df = data.frame(action, description)
df$action = na.locf(df$action)
df =
df %>%
group_by(action) %>%
summarise(description = paste(description, collapse = ' '))
Upvotes: 0