Reputation: 819
I have a data frame that is approximately 80x300, meaning its pretty large and needs to be done efficiently. Example below.
id <- c("Alpha", "Bravo", "Charlie", Delta")
var1 <- c(1, 6, 4, 9)
var2 <- c(57, 49, 88, 14)
var3 <- c(11, 67, 2, 44)
df <- data.frame(id, var1, var2, var3)
I would like to end up with a separate data frame which sorts this data by each variable and stores the id column and the value variable column. It would look something like this:
var1.n var1.v var2.n var2.v ...
Delta 9 Charlie 88
Bravo 6 Alpha 57
Charlie 4 Bravo 49
Alpha 1 Delta 14
...
A previous attempt at this did not include value variables (only name variables) and was done using this method
out <- as.data.frame(apply(df[,-1], 2, function(x) df$id[order(-x)]))
However I haven't been able to figure out how to expand this to include both the id column AND the value variable. I tried two methods below but 1) couldn't quite ever get the code to run properly because it uses some commands I'm not fully familiar with and 2) couldn't figure out how to implement exactly what I had in my head. The first was an attempt to work in the original data frame by injecting a column of NAs at each spot but I soon figured out that wouldn't work. The second was trying to create a new output frame in which I sort by the i'th column, then store the id variable, then store the i'th column which seemed promising but I must be missing something because itruns without doing anything or gives something like a replacement error.
# attempt 1
for (i in 1:ncol(df)) {
df<- as.data.frame(append(df, list(paste(colnames(df)[i],"name", sep="_")=NA), after=i))
df<- order(df[i]) # would need to skip alternating rows
df[i] <- df$id # not right at all
}
# attempt two
for (i in 1:ncol(df)) {
order(df[i])
out$paste(colnames(df)[i],"name", sep="_")] <- df$id
out$paste(colnames(df)[i]) <- df[i]
}
There are extra nuances in this so I'd love a generalizable method if possible but I'll take all the help I can get.
Upvotes: 0
Views: 221
Reputation: 1553
This can be done will lapply.
df1 <- lapply(names(df[,-1]), function(x) {
o <- cbind(df[1], df[x])
colnames(o) <- c(paste0(x, ".n"), paste0(x, ".v") )
o <- o[order(-o[2]),]
})
df2 <- do.call(cbind, df1)
Upvotes: 0
Reputation: 2424
What about something like this?
sort_id <- function(d, column) {
sorted <- d[order(d[ ,column], decreasing = TRUE), c(1, column)]
names(sorted) <- paste0(names(sorted)[2], c(".n", ".v"))
sorted
}
dfs <- Map(sort_id, replicate(3, df, simplify = FALSE), 2:4)
do.call(cbind, dfs)
var1.n var1.v var2.n var2.v var3.n var3.v
4 Delta 9 Charlie 88 Bravo 67
2 Bravo 6 Alpha 57 Delta 44
3 Charlie 4 Bravo 49 Alpha 11
1 Alpha 1 Delta 14 Charlie 2
Upvotes: 1
Reputation: 5178
If I am not mistaken, you want a new data.frame with the sorted variables and an individual ID column next to each one.
I think this is what you are looking for (I wrote it based on your own example):
df2 <- data.frame(matrix(nrow = nrow(df), ncol = 0))
for(i in 2:ncol(df)) {
newColName.n <- paste(colnames(df)[i], "n", sep = ".") # ID column for the current variable.
newColName.v <- paste(colnames(df)[i], "v", sep = ".") # Sorted variable column in descending order.
idx <- order(df[, i], decreasing = T)
temp <- data.frame(v1 = df$id[idx], v2 = df[idx, i])
colnames(temp) <- c(newColName.n, newColName.v)
df2 <- cbind(df2, temp)
}
In the end, df2
is what you want.
Upvotes: 1