Reputation: 13
Is there a way to shift a group of columns into their own row in R?
I currently have large a dataset that includes column headings like this:
Month | Year | Tenant 1 Name | Tenant 1 Rate | Tenant 1 Vacate Date | Tenant 1 Notes | Tenant 1 Name | Tenant 2 Rate | Tenant 2 Vacate Date | Tenant 2 Notes |
---|---|---|---|---|---|---|---|---|---|
Jan | 2001 | Bob | 1 | 2 | 3 | Joe | 1 | 2 | 3 |
I want to combine this information so that each Tenant within each month and year have their own rows. So the rows would just be like this:
Month | Year | Name | Rate | Date | Notes |
---|---|---|---|---|---|
Jan | 2001 | Bob | 1 | 2 | 3 |
Jan | 2001 | Joe | 1 | 2 | 3 |
I assume this would be something like group_by() but for multiple columns somehow?
Sorry for the clumsy formatting!
Upvotes: 1
Views: 125
Reputation: 21908
Thanks for the subtle tip from dear @akrun as always. I added a $
to the last capturing group to make sure it always chooses the last one.
This may sound a bit verbose, but it also does the trick. I created 3 name patterns, turning the first two into NA
and capture the third one:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(!c(Month, Year), names_to = c(NA, NA, ".value"),
names_pattern = "(\\w+) (\\w+) (\\w+$)")
# A tibble: 2 x 6
Month Year Name Rate Vacate Notes
<chr> <int> <chr> <int> <int> <int>
1 Jan 2001 Bob 1 2 3
2 Jan 2001 Joe 1 2 3
Upvotes: 2
Reputation: 886998
If the column name for 'Joe' is 'Tenant 2 Name', use pivot_longer
, specify the cols
as all except the 'Month', 'Year', and with names_pattern
, capture the column name substring as the characters that are not a space (\\S+
) at the end ($
) of the string
library(tidyr)
pivot_longer(df1, cols = -c(Month, Year),
names_to = ".value", names_pattern = ".*\\s+(\\S+)$")
-output
# A tibble: 2 x 6
# Month Year Name Rate Date Notes
# <chr> <int> <chr> <int> <int> <int>
#1 Jan 2001 Bob 1 2 3
#2 Jan 2001 Joe 1 2 3
df1 <- structure(list(Month = "Jan", Year = 2001L, `Tenant 1 Name` = "Bob",
`Tenant 1 Rate` = 1L, `Tenant 1 Vacate Date` = 2L, `Tenant 1 Notes` = 3L,
`Tenant 2 Name` = "Joe", `Tenant 2 Rate` = 1L, `Tenant 2 Vacate Date` = 2L,
`Tenant 2 Notes` = 3L), class = "data.frame", row.names = c(NA,
-1L))
Upvotes: 2
Reputation: 10735
First, to generate an example like yours (your example had "Tenant 1 Name" twice, but I guess it was just a typo).
colnames<-c("Month","Year","Tenant 1 Name","Tenant 1 Rate","Tenant 1 Vacate Date","Tenant 1 Notes","Tenant 2 Name","Tenant 2 Rate","Tenant 2 Vacate Date","Tenant 2 Notes")
fields<-c("Jan","2001","Bob","1","2","3","Joe","1","2","3")
mat<-matrix(fields,nrow=1)
colnames(mat)<-colnames
View(mat)
It will look like this:
Now, identify which column have "Name" in them
cols<-grep("Name",colnames(mat))
cols
Then, extract names from those columns:
names<-mat[,cols]
And finally, filla new matrix:
newmat<-matrix(NA,nrow=0,ncol=6)
for(n in names){
whichcol<-which(mat[1,]==n)
newline<-c(mat[,1:2],mat[,whichcol:(whichcol+3)])
newmat<-rbind(newmat,newline)
}
View(newmat)
It will result in what you are looking for:
However, I have a feeling that the dataset you are working with has more layers of complexity (e.g., multiple lines), requiring a more complex solution. Please let us know if that's the case!
Upvotes: 2