Reputation: 163
I have a dataframe where I need to combine every tenth column. As an example my dataframe looks like this:
V1 V2 V3 V4 V5 V6
A B C D E F
What I'd like is:
first second third
AD BE CF
to do this I've been trying to use lapply and a custom function:
col_unite <- c(c('V1', 'V4', 'first'), c('V2', 'V5', 'second'), c('V3', 'V6', 'third'))
#function that combines columns and gives a new name
unite_cols <- function(dataframe, col_list){
dataframe[[col_list[3]]] <- paste0(dataframe[[col_list[1]]], ', ',
dataframe[[col_list[2]]])
dataframe
}
new_df <- lapply(col_unite, function(x){
df <- unite_cols(orig_df, x)
# df[x[3]] <- paste0(test[x[1]], ', ', test[x[2]]) # a different attempt
return(df)
})
I know I could do this with a bunch of unite statements, or maybe there is a way to do this with a loop and unite. But given the number of columns I need to loop through I'd like to try to do it in a more elegant way.
Upvotes: 1
Views: 716
Reputation: 270195
1) Convert the data frame to a 3d array, apply paste
over its first two dimensions giving matrix m
and convert back to data frame:
a <- array(as.matrix(d), c(nrow(d), 3, ncol(d)/3))
m <- apply(a, 1:2, paste, collapse = "")
as.data.frame(m, stringsAsFactors = FALSE)
giving
V1 V2 V3
1 ad be cf
2 gj hk il
2) Another approach if we have a grouping vector g
is the following. This does not require evenly spaced columns as we can define g
arbitrarily.
g <- rep(1:3, length = ncol(d))
Paste0 <- function(x) do.call("mapply", c(paste0, x))
as.data.frame(lapply(split(as.list(d), g), Paste0), stringsAsFactors = FALSE)
giving:
a b c
a ad be cf
g gj hk il
We used this 2x6 data frame as test input:
d <- structure(list(V1 = c("a", "g"), V2 = c("b", "h"), V3 = c("c",
"i"), V4 = c("d", "j"), V5 = c("e", "k"), V6 = c("f", "l")),
class = "data.frame", row.names = c(NA, -2L))
which looks like this:
> d
V1 V2 V3 V4 V5 V6
1 a b c d e f
2 g h i j k l
Upvotes: 2
Reputation: 12084
First, I define the data frame.
df <- read.table(text = "V1 V2 V3 V4 V5 V6
A B C D E F", header = TRUE, colClasses = "character")
Then, I use lapply
to go through and paste
each column with the column 3 places over and then recast the resulting list as a data frame.
lapply(1:(ncol(df)/2), function(x)paste0(df[,c(x,x+3)], collapse = "")) %>%
data.frame
The pipe (%>%
) requires the dplyr
package.
# X.AD. X.BE. X.CF.
# 1 AD BE CF
Alternatively, I create a list of paired column names to combine. Much the same approach otherwise.
MyList <- list(c("V1", "V4"), c("V2", "V5"), c("V3", "V6"))
lapply(MyList, function(x)paste0(df[,x], collapse = "")) %>%
data.frame
# X.AD. X.BE. X.CF.
# 1 AD BE CF
Upvotes: 2
Reputation: 50728
How about something like this
n <- 4
rbind.data.frame(apply(matrix(df, nrow = n - 1), 1, paste0, collapse = ""))
# X.AD. X.BE. X.CF.
#1 AD BE CF
Here we combine every n = 4
th entry by casting the data.frame
as a matrix
with n - 1
rows, paste0
ing rows, and converting the resulting matrix
back into a data.frame
.
The only thing left is to come up with suitable column names.
df <- read.table(text =
"V1 V2 V3 V4 V5 V6
A B C D E F", header = T, colClasses = "character")
Upvotes: 3