William
William

Reputation: 402

Dynamically merge column name and values of a dataframe using lapply or apply or mapply functions

I have a foo_dataframe (see below) that I want to convert to transaction data:

foo_dataframe <- data.frame(replicate(50,1:4))
foo_dataframe 
#  X1 X2 X3 X4 X5 X6 X7 X8 X9...................X50
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4

The transaction data I am expecting is below (i.e. the transaction data must be a concatenation of column name and each value of the dataframe):

#   X1    X2    X3    X4 ................X50
#1 X1 1  X2 1  X3 1  X4 1               X50 1 
#2 X1 2  X2 2  X3 2  X4 2               X50 2
#3 X1 3  X2 3  X3 3  X4 3               X50 3 
#4 X1 4  X2 4  X3 4  X4 4               X50 4 

I can concatenate each column and its values with this code:

m <- paste(colnames(foo_dataframe)[1], foo_dataframe[[1]], "")
n <- paste(colnames(foo_dataframe)[2], foo_dataframe[[2]], "")
o <- paste(colnames(foo_dataframe)[3], foo_dataframe[[3]], "")
p <- paste(colnames(foo_dataframe)[4], foo_dataframe[[4]], "")

And later join them using data.frame(m,n,o,p) to produce:

#   X1    X2    X3    X4
#1 X1 1  X2 1  X3 1  X4 1 
#2 X1 2  X2 2  X3 2  X4 2 
#3 X1 3  X2 3  X3 3  X4 3 
#4 X1 4  X2 4  X3 4  X4 4

To save time, I think this can be done dynamically using apply functions because I have many columns to be done. However, when I tried apply function, with the code below:

c <- 1:length(length(colnames(foo_dataframe)))
t <- foo_dataframe
transactionData <- function(t, c){ # t = dataframe; c = column no.
  paste(colnames(t)[c], t[[c]], "")
}
foo_transactionData <- lapply(t, transactionData, c)

I got the following error:

Error in t[[c]] : attempt to select more than one element in vectorIndex

I have toiled stackoverflow to seek for solution but have not found any. Any help will be appreciated. Thanks.

Upvotes: 0

Views: 160

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389135

We can use Map :

foo_dataframe[] <- Map(paste, names(foo_dataframe), foo_dataframe)

foo_dataframe[, 1:4]

#    X1   X2   X3   X4
#1 X1 1 X2 1 X3 1 X4 1
#2 X1 2 X2 2 X3 2 X4 2
#3 X1 3 X2 3 X3 3 X4 3
#4 X1 4 X2 4 X3 4 X4 4

Using lapply, we can loop over the index of columns or their names

foo_dataframe[] <- lapply(names(foo_dataframe), function(x) 
                   paste(x, foo_dataframe[[x]]))

The equivalent options using purrr are :

library(purrr)
imap_dfc(foo_dataframe, ~paste(.y, .x))
map2_dfc(foo_dataframe, names(foo_dataframe), ~paste(.y, .x))
map_dfc(names(foo_dataframe), ~paste(.x, foo_dataframe[[.x]]))

EDIT

To avoid NA values from pasting we can do :

foo_dataframe[] <- Map(function(x, y) ifelse(is.na(y), "",paste(x, y)), 
                       names(foo_dataframe), foo_dataframe)

Upvotes: 2

Related Questions