Reputation: 819

R - sort on each variable, store id and value columns in output df

I have a data frame that is approximately 80x300, meaning its pretty large and needs to be done efficiently. Example below.

id <- c("Alpha", "Bravo", "Charlie", Delta")
var1 <- c(1, 6, 4, 9)
var2 <- c(57, 49, 88, 14)
var3 <- c(11, 67, 2, 44)
df <- data.frame(id, var1, var2, var3)

I would like to end up with a separate data frame which sorts this data by each variable and stores the id column and the value variable column. It would look something like this:

var1.n     var1.v     var2.n     var2.v     ...
Delta      9          Charlie    88
Bravo      6          Alpha      57
Charlie    4          Bravo      49
Alpha      1          Delta      14
...

A previous attempt at this did not include value variables (only name variables) and was done using this method

out <- as.data.frame(apply(df[,-1], 2, function(x) df$id[order(-x)]))

However I haven't been able to figure out how to expand this to include both the id column AND the value variable. I tried two methods below but 1) couldn't quite ever get the code to run properly because it uses some commands I'm not fully familiar with and 2) couldn't figure out how to implement exactly what I had in my head. The first was an attempt to work in the original data frame by injecting a column of NAs at each spot but I soon figured out that wouldn't work. The second was trying to create a new output frame in which I sort by the i'th column, then store the id variable, then store the i'th column which seemed promising but I must be missing something because itruns without doing anything or gives something like a replacement error.

# attempt 1
for (i in 1:ncol(df)) {
  df<- as.data.frame(append(df, list(paste(colnames(df)[i],"name", sep="_")=NA), after=i))
  df<- order(df[i]) # would need to skip alternating rows
  df[i] <- df$id # not right at all
}

# attempt two
for (i in 1:ncol(df)) {
  order(df[i])
  out$paste(colnames(df)[i],"name", sep="_")] <- df$id
  out$paste(colnames(df)[i]) <- df[i]
}

There are extra nuances in this so I'd love a generalizable method if possible but I'll take all the help I can get.

Upvotes: 0

Answers (3)

Wyldsoul

Reputation: 1553

This can be done will lapply.

df1 <-  lapply(names(df[,-1]), function(x) {
    o <-  cbind(df[1], df[x])
   colnames(o) <- c(paste0(x, ".n"), paste0(x, ".v") )
    o <- o[order(-o[2]),]
 })
 df2 <- do.call(cbind, df1)

Upvotes: 0

Daniel Anderson

Reputation: 2424

What about something like this?

sort_id <- function(d, column) {
  sorted <- d[order(d[ ,column], decreasing = TRUE), c(1, column)]
  names(sorted) <- paste0(names(sorted)[2], c(".n", ".v"))
  sorted
}


dfs <- Map(sort_id, replicate(3, df, simplify = FALSE), 2:4)
do.call(cbind, dfs)

   var1.n var1.v  var2.n var2.v  var3.n var3.v
4   Delta      9 Charlie     88   Bravo     67
2   Bravo      6   Alpha     57   Delta     44
3 Charlie      4   Bravo     49   Alpha     11
1   Alpha      1   Delta     14 Charlie      2

Upvotes: 1

Matin Kh

Reputation: 5178

If I am not mistaken, you want a new data.frame with the sorted variables and an individual ID column next to each one.

I think this is what you are looking for (I wrote it based on your own example):

df2 <- data.frame(matrix(nrow = nrow(df), ncol = 0))
for(i in 2:ncol(df)) {
    newColName.n <- paste(colnames(df)[i], "n", sep = ".") # ID column for the current variable.
    newColName.v <- paste(colnames(df)[i], "v", sep = ".") # Sorted variable column in descending order.
    idx <- order(df[, i], decreasing = T)
    temp <- data.frame(v1 = df$id[idx], v2 = df[idx, i])
    colnames(temp) <- c(newColName.n, newColName.v)
    df2 <- cbind(df2, temp)
}

In the end, df2 is what you want.

Upvotes: 1

R - sort on each variable, store id and value columns in output df

Answers (3)

Related Questions