user1007587
user1007587

Reputation: 13

Combining multiple columns in R?

Is there a way to shift a group of columns into their own row in R?

I currently have large a dataset that includes column headings like this:

Month Year Tenant 1 Name Tenant 1 Rate Tenant 1 Vacate Date Tenant 1 Notes Tenant 1 Name Tenant 2 Rate Tenant 2 Vacate Date Tenant 2 Notes
Jan 2001 Bob 1 2 3 Joe 1 2 3

I want to combine this information so that each Tenant within each month and year have their own rows. So the rows would just be like this:

Month Year Name Rate Date Notes
Jan 2001 Bob 1 2 3
Jan 2001 Joe 1 2 3

I assume this would be something like group_by() but for multiple columns somehow?

Sorry for the clumsy formatting!

Upvotes: 1

Views: 125

Answers (3)

Anoushiravan R
Anoushiravan R

Reputation: 21908

Thanks for the subtle tip from dear @akrun as always. I added a $ to the last capturing group to make sure it always chooses the last one. This may sound a bit verbose, but it also does the trick. I created 3 name patterns, turning the first two into NA and capture the third one:

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(!c(Month, Year), names_to = c(NA, NA, ".value"),
               names_pattern = "(\\w+) (\\w+) (\\w+$)")

# A tibble: 2 x 6
  Month  Year Name   Rate Vacate Notes
  <chr> <int> <chr> <int>  <int> <int>
1 Jan    2001 Bob       1      2     3
2 Jan    2001 Joe       1      2     3

Upvotes: 2

akrun
akrun

Reputation: 886998

If the column name for 'Joe' is 'Tenant 2 Name', use pivot_longer, specify the cols as all except the 'Month', 'Year', and with names_pattern, capture the column name substring as the characters that are not a space (\\S+) at the end ($) of the string

library(tidyr)
pivot_longer(df1, cols = -c(Month, Year), 
     names_to = ".value", names_pattern = ".*\\s+(\\S+)$")

-output

# A tibble: 2 x 6
#  Month  Year Name   Rate  Date Notes
#  <chr> <int> <chr> <int> <int> <int>
#1 Jan    2001 Bob       1     2     3
#2 Jan    2001 Joe       1     2     3

data

df1 <- structure(list(Month = "Jan", Year = 2001L, `Tenant 1 Name` = "Bob", 
    `Tenant 1 Rate` = 1L, `Tenant 1 Vacate Date` = 2L, `Tenant 1 Notes` = 3L, 
    `Tenant 2 Name` = "Joe", `Tenant 2 Rate` = 1L, `Tenant 2 Vacate Date` = 2L, 
    `Tenant 2 Notes` = 3L), class = "data.frame", row.names = c(NA, 
-1L))

Upvotes: 2

Federico Giorgi
Federico Giorgi

Reputation: 10735

First, to generate an example like yours (your example had "Tenant 1 Name" twice, but I guess it was just a typo).

colnames<-c("Month","Year","Tenant 1 Name","Tenant 1 Rate","Tenant 1 Vacate Date","Tenant 1 Notes","Tenant 2 Name","Tenant 2 Rate","Tenant 2 Vacate Date","Tenant 2 Notes")
fields<-c("Jan","2001","Bob","1","2","3","Joe","1","2","3")
mat<-matrix(fields,nrow=1)
colnames(mat)<-colnames
View(mat)

It will look like this:

Your example

Now, identify which column have "Name" in them

cols<-grep("Name",colnames(mat))
cols

Then, extract names from those columns:

names<-mat[,cols]

And finally, filla new matrix:

newmat<-matrix(NA,nrow=0,ncol=6)
for(n in names){
    whichcol<-which(mat[1,]==n)
    newline<-c(mat[,1:2],mat[,whichcol:(whichcol+3)])
    newmat<-rbind(newmat,newline)
}
View(newmat)

It will result in what you are looking for:

newresults

However, I have a feeling that the dataset you are working with has more layers of complexity (e.g., multiple lines), requiring a more complex solution. Please let us know if that's the case!

Upvotes: 2

Related Questions