Reputation: 97
I have a list which contain different groups and each group has some element; for example
>lst
grup name
A cancer
A diabetes
A Alzheimer's
A Carcinoma
A Lung Diseases
A Adenoma
A Hyperplasia
B Cortical
B Aortic Aneurysm
B Asthma
E Pneumonia
E Asthma
Now i want all the possible pairs of element from group 'A', 'B' and so on. There are seven element in 'A' so the pairs i want is like (cancer, diabetes), (cancer, Alzheimer'), (cancer,Carcinoma), (cancer, Lung Disease), (cancer, Adenoma), (cancer,Hyperplasia) same for diabetes and all the elements of group 'A' than same for group 'B' and 'E'. In short, pair of each elements with each other element of that group. I tried the following code but it's not giving me the correct answer it gave the list with some missing element.
Code:
spt <- split(lst, lst$name)# split the list into group
dis_name <- lapply(1:length(spt), function(x) as.character(spt[[x]][[2]]))
pr <- list()
for(k in 1:length(dis_name))
{
grp <- dis_name[[k]]
l <- length(grp)
for(m in 1:l)
{
for(p in 1:l)
{
pr[m][p] <- list(NULL)
cm <- paste(grp[m],",", grp[p])
pr[[m]][[p]] <- list(cm = cm)
}
}
}
pr
What is wrong with this i can't understand. This is a small example of my data i have huge data so if i want to run it parallel than how to run it with package foreach
and doSNOW
. please help, any help appreciated. Thanks.
My desired output is:
[[1]]
[[1]][[2]]
"cancer , diabetes"
[[1]][[3]]
"cancer , Alzheimer's"
[[1]][[4]]
"cancer , Carcinoma"
[[1]][[5]]
"cancer , Lung Diseases"
[[1]][[6]]
"cancer , Adenoma"
[[1]][[7]]
"cancer , Hyperplasia"
[[2]]
[[2]][[1]]
"diabets , cancer"
[[2]][[3]]
"diabetes , Alzheimer's"
.
.
.
[[2]][[7]]
"diabetes , Hyperplasia"
[[3]]
[[3]][[1]]
"Alzheimer's , cancer"
.
.
.
[[3]][[7]]
"Alzheimer's , Hyperplasia"
[[4]]
[[4]][[1]]
.
.
.
[[4]][[7]]
[[5]]
[[5]][[1]]
.
.
.
[[5]][[7]]
[[6]]
[[6]][[1]]
.
.
.
[[7]]
[[7]][[1]]
.
.
.
Same for the elements of 'B' and 'C'
[[2]]
[[1]]
[[1]][[2]]
"Cortical , Aortic Aneurysm"
[[1]][[3]]
"Cortical , Asthma"
[[2]]
[[2]][[1]]
"Aortic Aneurysm , Cortical"
[[2]][[3]]
"Aortic Aneurysm , Asthma"
[[3]]
[[3]][[1]]
.
.
[[3]][[2]]
[[3]]
[[1]]
[[1]][[2]]
"Pneumonia , Asthma"
[[2]]
[[2]][[1]]
"Asthma , Pneumonia"
My output is looking like that, but the pair in which the name remains same but only the order would be change are considered as one say:
"Asthma , Pneumonia"
is same as "Pneumonia , Asthma" so considered it as one pair. Thanks.
Hello again, Here i posted the small part of my data for which the below given solution is not working i can not understand what is wrong because the example which i was given previously was same as my real data still the lapply not work and gave the error please help. I really appreciated any help again. I am trying to solve the error but i can't.
sort_gene:
data.geneSymbol data.diseaseName
A2M Acute Kidney Injury
A2M Adenoma, Liver Cell
A2M Alzheimer Disease
A2M Carcinoma, Hepatocellular
A2M Colonic Neoplasms
A2M Lung Diseases
A2M Lung Neoplasms
A2M Nephrotic Syndrome
A4GALT Blood group antigen p
A4GALT Burkitt Lymphoma
A4GALT Hyperostosis, Cortical, Congenital
AAA1 Aortic Aneurysm, Familial Abdominal 1
AAA2 Aortic Aneurysm, Familial Abdominal 2
Error:Error in FUN(X[[i]], ...) : n < m
Please get me out of this. I really need help. Thanks
Upvotes: 0
Views: 135
Reputation: 956
I think this does what you need. The second line is basically what nicola suggested, and the third formats the output.
lst <- data.frame(grup = c(rep("A", 7), rep("B", 3), "E", "E"), name = c("cancer", "diabetes", "Alzheimer's", "Carcinoma", "Lung Diseases", "Adenoma", "Hyperplasia", "Cortical", "Aortic Aneurysm", "Asthma", "Pneumonia", "Asthma"))
output <- lapply(split(lst$name, lst$grup), combn, 2, simplify = F)
output <- lapply(output, function(x) lapply(x, as.character))
Then turn each pair into a single string rather than a vector and calculate the frequency of each pair:
output <- lapply(output, function(x) lapply(x, paste, collapse = " "))
table(unlist(output))
Upvotes: 1
Reputation: 24510
Try this (lst
is from Dan Lewer's answer):
setNames(lapply(split(lst$name, lst$grup),
function(x) combn(x,2,simplify=FALSE,FUN=paste,collapse=" , ")),NULL)
#[[1]]
#[[1]][[1]]
#[1] "cancer , diabetes"
#
#[[1]][[2]]
#[1] "cancer , Alzheimer's"
#
#[[1]][[3]]
#[1] "cancer , Carcinoma"
#...
Upvotes: 1