Reputation: 171
I am an inexperienced data wrangler so forgive my simple language.
I have a dataframe
df
sample1 sample2 sample3 median
1 1 2 3 2
2 2 2 5 2
3 5 4 5 5
4 5 6 5 5
5 2 6 6 6
and I want to take all values corresponding to unique values in the median column.
Something like this:
median2
[1] 1 2 3 2 2 5
median5
[1] 5 4 5 5 6 5
median6
[1] 2 6 6
I want to do this for a large dataframe so I don't want something like
median2 <- df[df$median == '2',]
Upvotes: 1
Views: 206
Reputation: 1474
Here is a tidyverse approach.
library(tidyverse)
dat1 <- dat |>
mutate(median = paste0("median", median)) |>
group_by(median) |>
summarise(new_var = list(
as.integer(t(cur_data()))
)) |>
deframe()
dat1
# $median2
# [1] 1 2 2 2 3 5
#
# $median5
# [1] 5 5 4 6 5 5
#
# $median6
# [1] 2 6 6
Steps:
Convert the dataframe within each group of median value (denoted by cur_data()
) to integer. Note that it was first converted to a matrix using t()
to get the desired number order.
deframe()
from tibble
package convert a list column to a named
list.
Upvotes: 1
Reputation: 388807
You can use split
after unlisting the dataframe into vector.
cols <- grep('sample', names(df))
split(c(t(df[cols])), paste0('median', rep(df$median, each = length(cols))))
#$median2
#[1] 1 2 3 2 2 5
#$median5
#[1] 5 4 5 5 6 5
#$median6
#[1] 2 6 6
Upvotes: 1
Reputation: 160407
out <- by(df, df$median, function(z) unlist(subset(z, select = -median), use.names = FALSE))
out
# df$median: 2
# [1] 1 2 2 2 3 5
# ---------------------------------------------------------------------------------------------------------------------------------------------------
# df$median: 5
# [1] 5 5 4 6 5 5
# ---------------------------------------------------------------------------------------------------------------------------------------------------
# df$median: 6
# [1] 2 6 6
Realize that a by
-class return is really just a glorified list
, so it can be dealt with in the same ways:
str(out)
# List of 3
# $ 2: int [1:6] 1 2 2 2 3 5
# $ 5: int [1:6] 5 5 4 6 5 5
# $ 6: int [1:3] 2 6 6
# - attr(*, "dim")= int 3
# - attr(*, "dimnames")=List of 1
# ..$ df$median: chr [1:3] "2" "5" "6"
# - attr(*, "call")= language by.data.frame(data = df, INDICES = df$median, FUN = function(z) unlist(subset(z, select = -median), use.names = FALSE))
# - attr(*, "class")= chr "by"
Upvotes: 2