priyanka nimavat
priyanka nimavat

Reputation: 97

Create a pair of list element in r

I have a list which contain different groups and each group has some element; for example

>lst
grup   name
A      cancer
A      diabetes
A      Alzheimer's
A      Carcinoma
A      Lung Diseases
A      Adenoma
A      Hyperplasia
B      Cortical
B      Aortic Aneurysm
B      Asthma
E      Pneumonia
E      Asthma

Now i want all the possible pairs of element from group 'A', 'B' and so on. There are seven element in 'A' so the pairs i want is like (cancer, diabetes), (cancer, Alzheimer'), (cancer,Carcinoma), (cancer, Lung Disease), (cancer, Adenoma), (cancer,Hyperplasia) same for diabetes and all the elements of group 'A' than same for group 'B' and 'E'. In short, pair of each elements with each other element of that group. I tried the following code but it's not giving me the correct answer it gave the list with some missing element.

Code:

spt <- split(lst, lst$name)# split the list into group
dis_name <- lapply(1:length(spt), function(x) as.character(spt[[x]][[2]]))
pr <- list()
for(k in 1:length(dis_name))
{
  grp <- dis_name[[k]]
  l <- length(grp)

  for(m in 1:l)
  {
    for(p in 1:l)
    {

      pr[m][p] <- list(NULL) 
      cm <- paste(grp[m],",", grp[p])
      pr[[m]][[p]] <- list(cm = cm) 
    }

  }
}

pr

What is wrong with this i can't understand. This is a small example of my data i have huge data so if i want to run it parallel than how to run it with package foreach and doSNOW. please help, any help appreciated. Thanks.

My desired output is:

[[1]]
[[1]][[2]]
"cancer , diabetes"
[[1]][[3]]
"cancer , Alzheimer's"
[[1]][[4]]
"cancer , Carcinoma"
[[1]][[5]]
"cancer , Lung Diseases"
[[1]][[6]]
"cancer , Adenoma"
[[1]][[7]]
"cancer , Hyperplasia"
[[2]]
[[2]][[1]]
"diabets , cancer"
[[2]][[3]]
"diabetes , Alzheimer's"
.
.
.
[[2]][[7]]
"diabetes , Hyperplasia"
[[3]]
[[3]][[1]]
"Alzheimer's , cancer"
.
.
.
[[3]][[7]]
"Alzheimer's , Hyperplasia"
[[4]]
[[4]][[1]]
.
.
.
[[4]][[7]]
[[5]]
[[5]][[1]]
.
.
.
[[5]][[7]]
[[6]]
[[6]][[1]]
.
.
.
[[7]]
[[7]][[1]]
.
.
.

Same for the elements of 'B' and 'C'

[[2]]
[[1]]
[[1]][[2]]
"Cortical , Aortic Aneurysm"
[[1]][[3]]
"Cortical , Asthma"
[[2]]
[[2]][[1]]
"Aortic Aneurysm , Cortical"
[[2]][[3]]
"Aortic Aneurysm , Asthma"
[[3]]
[[3]][[1]]
.
.
[[3]][[2]]
[[3]]
[[1]]
[[1]][[2]]
"Pneumonia , Asthma"
[[2]]
[[2]][[1]]
"Asthma , Pneumonia"

My output is looking like that, but the pair in which the name remains same but only the order would be change are considered as one say:

"Asthma , Pneumonia"

is same as "Pneumonia , Asthma" so considered it as one pair. Thanks.

Hello again, Here i posted the small part of my data for which the below given solution is not working i can not understand what is wrong because the example which i was given previously was same as my real data still the lapply not work and gave the error please help. I really appreciated any help again. I am trying to solve the error but i can't.

 sort_gene:
 data.geneSymbol    data.diseaseName
 A2M                Acute Kidney Injury
 A2M                Adenoma, Liver Cell
 A2M                Alzheimer Disease
 A2M                Carcinoma, Hepatocellular
 A2M                Colonic Neoplasms
 A2M                Lung Diseases
 A2M                Lung Neoplasms
 A2M                Nephrotic Syndrome
 A4GALT             Blood group antigen p
 A4GALT             Burkitt Lymphoma
 A4GALT             Hyperostosis, Cortical, Congenital
 AAA1               Aortic Aneurysm, Familial Abdominal 1
 AAA2               Aortic Aneurysm, Familial Abdominal 2

Error:Error in FUN(X[[i]], ...) : n < m

Please get me out of this. I really need help. Thanks

Upvotes: 0

Views: 135

Answers (2)

Dan Lewer
Dan Lewer

Reputation: 956

I think this does what you need. The second line is basically what nicola suggested, and the third formats the output.

lst <- data.frame(grup = c(rep("A", 7), rep("B", 3), "E", "E"), name = c("cancer", "diabetes", "Alzheimer's", "Carcinoma", "Lung Diseases", "Adenoma", "Hyperplasia", "Cortical", "Aortic Aneurysm", "Asthma", "Pneumonia", "Asthma"))
output <- lapply(split(lst$name, lst$grup), combn, 2, simplify = F)
output <- lapply(output, function(x) lapply(x, as.character))

Then turn each pair into a single string rather than a vector and calculate the frequency of each pair:

output <- lapply(output, function(x) lapply(x, paste, collapse = " "))
table(unlist(output))

Upvotes: 1

nicola
nicola

Reputation: 24510

Try this (lst is from Dan Lewer's answer):

setNames(lapply(split(lst$name, lst$grup),
         function(x) combn(x,2,simplify=FALSE,FUN=paste,collapse=" , ")),NULL)
#[[1]]
#[[1]][[1]]
#[1] "cancer , diabetes"
#
#[[1]][[2]]
#[1] "cancer , Alzheimer's"
#
#[[1]][[3]]
#[1] "cancer , Carcinoma"
#...

Upvotes: 1

Related Questions