WindSur
WindSur

Reputation: 140

How to filter rows for every column

I have a big dataframe ( data.txt). The first column is the name of the genes, and the others column the Sample. An example of this df:

enter image description here

I followed up this post:

How to filter rows for every column independently using dplyr

Because is exactly what I am looking for. I want to create 3 subsets depending of the gene value. One subset for values: <0, ==0, and >0.

But I get this error:

Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 448    rows: * 45317, 50187 * 64477, 65535 * 146028, 148040

I have used this code:

Data <- read.table("data_CNA.txt",sep="\t", header=TRUE)
library(tidyverse)
gain <- Data %>% gather(name, value, -Hugo_Symbol) %>% filter(value >= 1) %>% spread(name, value)

If you have any other idea better than this, is welcome! Thanks

Upvotes: 0

Views: 147

Answers (1)

Neel Kamal
Neel Kamal

Reputation: 1076

To create subsets based on column value, you can create a temp_field based on gene values: <0, ==0, and >0. and then split the data frame using split function of base library.

df_list <- Data %>% rownames_to_column(var = "Id") %>% 
  gather(name, value, -c(Hugo_Symbol,Id)) %>%
  mutate(temp_field = case_when(value < 0 ~ "loss",
                                value > 0 ~ "gain",
                                T ~ "neutral"),
         temp_field = as.factor(temp_field)
  ) %>% split(., .$temp_field)

spread_df_func <- function(df){
  d <- df %>% select(Id,Hugo_Symbol, name, value) %>% spread(key = name, value = value)
  return(d)
}

org_df_list <- df_list %>% map(spread_df_func)

As I don't have data to test, the above function may have syntactical error, however, it should be logically correct.

Let me know, if it solves your issue.

You may also refer to link, on split and merge data frame.

Upvotes: 1

Related Questions