Reputation: 140
I have a big dataframe ( data.txt). The first column is the name of the genes, and the others column the Sample. An example of this df:
I followed up this post:
How to filter rows for every column independently using dplyr
Because is exactly what I am looking for. I want to create 3 subsets depending of the gene value. One subset for values: <0, ==0, and >0.
But I get this error:
Error: Each row of output must be identified by a unique combination of keys. Keys are shared for 448 rows: * 45317, 50187 * 64477, 65535 * 146028, 148040
I have used this code:
Data <- read.table("data_CNA.txt",sep="\t", header=TRUE)
library(tidyverse)
gain <- Data %>% gather(name, value, -Hugo_Symbol) %>% filter(value >= 1) %>% spread(name, value)
If you have any other idea better than this, is welcome! Thanks
Upvotes: 0
Views: 147
Reputation: 1076
To create subsets based on column value, you can create a temp_field based on gene values: <0, ==0, and >0. and then split the data frame using split function of base library.
df_list <- Data %>% rownames_to_column(var = "Id") %>%
gather(name, value, -c(Hugo_Symbol,Id)) %>%
mutate(temp_field = case_when(value < 0 ~ "loss",
value > 0 ~ "gain",
T ~ "neutral"),
temp_field = as.factor(temp_field)
) %>% split(., .$temp_field)
spread_df_func <- function(df){
d <- df %>% select(Id,Hugo_Symbol, name, value) %>% spread(key = name, value = value)
return(d)
}
org_df_list <- df_list %>% map(spread_df_func)
As I don't have data to test, the above function may have syntactical error, however, it should be logically correct.
Let me know, if it solves your issue.
You may also refer to link, on split and merge data frame.
Upvotes: 1