Mike de Groot
Mike de Groot

Reputation: 171

Subset dataframe by column value

I am an inexperienced data wrangler so forgive my simple language.

I have a dataframe

 df
  sample1 sample2 sample3 median
1       1       2       3      2
2       2       2       5      2
3       5       4       5      5
4       5       6       5      5
5       2       6       6      6

and I want to take all values corresponding to unique values in the median column.

Something like this:

median2
[1] 1 2 3 2 2 5

median5
[1] 5 4 5 5 6 5

median6
[1] 2 6 6

I want to do this for a large dataframe so I don't want something like

median2 <- df[df$median == '2',]

Upvotes: 1

Views: 206

Answers (3)

Zaw
Zaw

Reputation: 1474

Here is a tidyverse approach.

library(tidyverse)

dat1 <- dat |> 
  mutate(median = paste0("median", median)) |> 
  group_by(median) |> 
  summarise(new_var = list(
    as.integer(t(cur_data()))
  )) |> 
  deframe()

dat1

# $median2
# [1] 1 2 2 2 3 5
# 
# $median5
# [1] 5 5 4 6 5 5
# 
# $median6
# [1] 2 6 6

Steps:

  1. Convert the dataframe within each group of median value (denoted by cur_data()) to integer. Note that it was first converted to a matrix using t() to get the desired number order.

  2. deframe() from tibble package convert a list column to a named list.

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388807

You can use split after unlisting the dataframe into vector.

cols <- grep('sample', names(df))
split(c(t(df[cols])), paste0('median', rep(df$median, each = length(cols))))

#$median2
#[1] 1 2 3 2 2 5

#$median5
#[1] 5 4 5 5 6 5

#$median6
#[1] 2 6 6

Upvotes: 1

r2evans
r2evans

Reputation: 160407

out <- by(df, df$median, function(z) unlist(subset(z, select = -median), use.names = FALSE))
out
# df$median: 2
# [1] 1 2 2 2 3 5
# --------------------------------------------------------------------------------------------------------------------------------------------------- 
# df$median: 5
# [1] 5 5 4 6 5 5
# --------------------------------------------------------------------------------------------------------------------------------------------------- 
# df$median: 6
# [1] 2 6 6

Realize that a by-class return is really just a glorified list, so it can be dealt with in the same ways:

str(out)
# List of 3
#  $ 2: int [1:6] 1 2 2 2 3 5
#  $ 5: int [1:6] 5 5 4 6 5 5
#  $ 6: int [1:3] 2 6 6
#  - attr(*, "dim")= int 3
#  - attr(*, "dimnames")=List of 1
#   ..$ df$median: chr [1:3] "2" "5" "6"
#  - attr(*, "call")= language by.data.frame(data = df, INDICES = df$median, FUN = function(z) unlist(subset(z, select = -median), use.names = FALSE))
#  - attr(*, "class")= chr "by"

Upvotes: 2

Related Questions