denis
denis

Reputation: 802

How to fill down a given text up to another given text and so on in R?

Probably already answered, but I'm struggling to find the answer to this question: In a new column 'new_text', how to fill down a given text to another given text, and so on...

In the example below, how to fill 'Potter' to 'Wisley' then 'Wisley' to 'Granger', etc...?

The idea is to apply the proposed solution to dataframes of thousands of lines (obtained with pdftools::pdf_data) by selecting a sequence of specific words to fill down in this way.

Thanks for help.

> dat0
      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger  

Data:

dat0 <-
structure(list(text = c("Potter", "hj7d", "kl8ep", "f3d", "rtyzs2", 
"Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne"), new_text = c("Potter", 
"Potter", "Potter", "Potter", "Potter", "Wisley", "Wisley", "Wisley", 
"Granger", "Granger", "Granger", "Granger")), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 4

Views: 97

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 102529

Obviously that @Edward's fill solution is the most concise option for your problem, you definitely won't wanna miss that.

My solution is built on base R (if you are interested and play it for fun), where you can use cumsum + %in% + ave like below

nms <- c("Potter", "Wisley", "Granger")
transform(
    df,
    new_text = nms[ave(
        match(text, nms),
        cumsum(text %in% nms),
        FUN = na.omit
    )]
)

which gives

      text new_text
1   Potter   Potter
2     hj7d   Potter
3    kl8ep   Potter
4      f3d   Potter
5   rtyzs2   Potter
6   Wisley   Wisley
7     lq6s   Wisley
8      2fg   Wisley
9  Granger  Granger
10    r8ka  Granger
11      h9  Granger
12   qm9ne  Granger
13  Potter   Potter
14    abcd   Potter
15    d9k2   Potter
16    89kx   Potter
17    dkdi   Potter

data

df <- structure(list(text = c(
    "Potter", "hj7d", "kl8ep", "f3d", "rtyzs2",
    "Wisley", "lq6s", "2fg", "Granger", "r8ka", "h9", "qm9ne",
    "Potter", "abcd", "d9k2", "89kx", "dkdi"
)), row.names = c(
    NA,
    -17L
), class = "data.frame")

> df
      text
1   Potter
2     hj7d
3    kl8ep
4      f3d
5   rtyzs2
6   Wisley
7     lq6s
8      2fg
9  Granger
10    r8ka
11      h9
12   qm9ne
13  Potter
14    abcd
15    d9k2
16    89kx
17    dkdi

Upvotes: 3

Edward
Edward

Reputation: 19339

One way is to convert the non-names to NA and then use fill from tidyr. You'll need to set up the specific words (names) that you want to keep first.

library(tidyr)

Names <- c("Potter", "Wisley", "Granger")

transform(dat0, text=ifelse(text %in% Names, text, NA)) |>
  fill(text)
      text new_text
1   Potter   Potter
2   Potter   Potter
3   Potter   Potter
4   Potter   Potter
5   Potter   Potter
6   Wisley   Wisley
7   Wisley   Wisley
8   Wisley   Wisley
9  Granger  Granger
10 Granger  Granger
11 Granger  Granger
12 Granger  Granger

Upvotes: 4

Related Questions