namtar
namtar

Reputation: 95

How can I collapse multiple rows into one based on empty rows in another column in R?

I have the following data frame, which was sourced from a text file:

Account_No            Title                                      Date

52683    DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.      8/03/2019
6224     KABIR IN MALWA                                        29/04/2015
25801    A VILLAGE IS A BUSY PLACE / GEETHA,V.                  5/06/2020
11439    KABIR IN AMERICA                                      29/04/2015
25802    A VILLAGE IS A BUSY PLACE / GEETHA,V.                  5/06/2020
7843     IN EVERY BODY KABIR                                   29/04/2015
13013    MOBY-DICK : A POP-UP BOOK / ITA,SAM.                  22/01/2020
38110    DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND            29/11/2010
         TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &
         MYERSO.
38118    SCIENCE COMMUNICATION IN THEORY AND PRACTICE           6/12/2010
          / STOCKLMAYER, SUSAN M Et al (ED.
7844     KABIR IN THUMRI                                       29/04/2015
25042    TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH       13/04/2018
         GUIDE / BAL,MIEKE.
         001.3 BAL

In excel, I used Data to Columns to create the following table using a fixed width delimiter:

Account_No Title Date
52683 DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA. 08-03-2019
6224 KABIR IN MALWA 29-04-2015
25801 A VILLAGE IS A BUSY PLACE / GEETHA,V. 05-06-2020
11439 KABIR IN AMERICA 29-04-2015
25802 A VILLAGE IS A BUSY PLACE / GEETHA,V. 05-06-2020
7843 IN EVERY BODY KABIR 29-04-2015
13013 MOBY-DICK : A POP-UP BOOK / ITA,SAM. 22-01-2020
38110 DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND 29-11-2010
TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &
MYERSO.
38118 SCIENCE COMMUNICATION IN THEORY AND PRACTICE 06-12-2010
/ STOCKLMAYER, SUSAN M Et al (ED.
7844 KABIR IN THUMRI 29-04-2015
25042 TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH 13-04-2018
GUIDE / BAL,MIEKE.
001.3 BAL
24655 EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO 05-11-2019
NEED TO READ RESEARCH / DANE,FRANCIS C.
001.4 DAN
30170 CASE STUDY RESEARCH METHODS / GILLHAM,BILL. 03-11-2011

The problem with this is that the some books have their titles in multiple rows (such as "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND...", where account number is 38110), whereas they should be in a single row with the corresponding Account_No.

How can I do this?

Dput of the data:

structure(list(Account_No = c("52683", "6224", "25801", "11439", 
"25802", "7843", "13013", "38110", "", "", "38118", "", "7844", 
"25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.", 
"KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA", 
"A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR", 
"MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND", 
"TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.", 
"SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.", 
"KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH", 
"GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO", 
"NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015", 
"05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "", 
"06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019", 
"", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")
``

Upvotes: 0

Views: 87

Answers (2)

TarJae
TarJae

Reputation: 78947

Here is how you can do it with tidyverse Explanation: First change blank cells of Account_No to NA then fill the NA with value above until next value then group by Account_No then combine the rows of Title to one row after that just tweak - > to get desired output

library(tidyverse)

df1 <- df %>% 
  mutate_at(vars(colnames(.)[names(.) %in% "Account_No"]),
            .funs = funs(ifelse(.=="", NA, as.character(.)))) %>% 
  fill(Account_No) %>% 
  group_by(Account_No) %>% 
  summarise(Title = paste(Title, collapse = " ")) %>% 
  ungroup () %>% 
  left_join(df, df1, by="Account_No") %>% 
  select (Account_No, Title = Title.x, Date, -Title.y)

enter image description here

Upvotes: 1

Mohan Govindasamy
Mohan Govindasamy

Reputation: 906

There might be some pretty way to do this, but this works. I'm running a for loop but somebody can help me replace it with something more efficient.

The logic behind is, first I'll convert empty strings "" to NA, then the column number here needed to be modified is the second column. So I'm running a for loop from the last row (nrow will give me the last row number) to the second row i.e I'll do the operation backwards from the last row. So lets say the row number is 10, I'll look for the same row in the n-1th column which here is 2-1 is 1. If in the 1st column has NA for that row then I check the second column the same row and if it is not NA that means the text is in the next line, so I'll take the text and combine it into the previous row of the second column and I repeat this process until I reach the second row.

It's little tricky for me to explain this in text but the concept is simple.

library(tidyverse)
df <- structure(list(Account_No = c("52683", "6224", "25801", "11439", 
                              "25802", "7843", "13013", "38110", "", "", "38118", "", "7844", 
                              "25042", "", "", "24655", "", "", "30170"), Title = c("DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.", 
                                                                                    "KABIR IN MALWA", "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "KABIR IN AMERICA", 
                                                                                    "A VILLAGE IS A BUSY PLACE / GEETHA,V.", "IN EVERY BODY KABIR", 
                                                                                    "MOBY-DICK : A POP-UP BOOK / ITA,SAM.", "DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND", 
                                                                                    "TECHNOLOGY : STRUCTURES / OWEN-JACKSON, GWYNETH &", "MYERSO.", 
                                                                                    "SCIENCE COMMUNICATION IN THEORY AND PRACTICE", "/ STOCKLMAYER, SUSAN M Et al (ED.", 
                                                                                    "KABIR IN THUMRI", "TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH", 
                                                                                    "GUIDE / BAL,MIEKE.", "001.3 BAL", "EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO", 
                                                                                    "NEED TO READ RESEARCH / DANE,FRANCIS C.", "001.4 DAN", "CASE STUDY RESEARCH METHODS / GILLHAM,BILL."
                              ), Date = c("08-03-2019", "29-04-2015", "05-06-2020", "29-04-2015", 
                                          "05-06-2020", "29-04-2015", "22-01-2020", "29-11-2010", "", "", 
                                          "06-12-2010", "", "29-04-2015", "13-04-2018", "", "", "05-11-2019", 
                                          "", "", "03-11-2011")), row.names = c(NA, 20L), class = "data.frame")


roller_coaster <- function(df, col_numb){
  for(i in nrow(df):2){
    if(is.na(df[i,(col_numb-1)])){
      if(!is.na(df[i,(col_numb)])){
        paste(df[i-1,col_numb], df[i,col_numb], sep = ' ') -> df[i-1,col_numb]
        NA -> df[i,col_numb]
      }
    }
  }
  df
}

df %>% 
  as_tibble() %>% 
  mutate(across(everything(), na_if, "")) %>% 
  roller_coaster(2) %>% 
  drop_na()
#> # A tibble: 13 x 3
#>    Account_No Title                                                     Date    
#>    <chr>      <chr>                                                     <chr>   
#>  1 52683      DESIGN IN THE TERRAIN OF WATER / MATHUR,ANURADHA.         08-03-2…
#>  2 6224       KABIR IN MALWA                                            29-04-2…
#>  3 25801      A VILLAGE IS A BUSY PLACE / GEETHA,V.                     05-06-2…
#>  4 11439      KABIR IN AMERICA                                          29-04-2…
#>  5 25802      A VILLAGE IS A BUSY PLACE / GEETHA,V.                     05-06-2…
#>  6 7843       IN EVERY BODY KABIR                                       29-04-2…
#>  7 13013      MOBY-DICK : A POP-UP BOOK / ITA,SAM.                      22-01-2…
#>  8 38110      DEVELOPING SUBJECT KNOWLEDGE IN DESIGN AND TECHNOLOGY : … 29-11-2…
#>  9 38118      SCIENCE COMMUNICATION IN THEORY AND PRACTICE / STOCKLMAY… 06-12-2…
#> 10 7844       KABIR IN THUMRI                                           29-04-2…
#> 11 25042      TRAVELLING CONCEPTS IN THE HUMANITIES - A ROUGH GUIDE / … 13-04-2…
#> 12 24655      EVALUTING RESEARCH : METHODOLOGY FOR PEOPLE WHO NEED TO … 05-11-2…
#> 13 30170      CASE STUDY RESEARCH METHODS / GILLHAM,BILL.               03-11-2…

Created on 2021-02-11 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions