Reputation: 969
I am using python2.7. I have a file which contains a chromosomal location and an experiment ID. I have got this information stored at the moment in two lists:
unique_locations - containing a single value for each location
location_exp - containing lists of [location, experiment]
The reason I have not used a dictionary is that there are multiple locations found in multiple experiments - i.e this is a many-many relationship.
I would like to find out on how many experiments each location is found. I.e get a list like:
[
[location1, [experiment1, experiment2, experiment3]],
[location2, [experiment2, experiment3, experiment4]]
]
etc.
As the lengths of the lists are different I have failed using an enumerate(list) loop on either lists. I did try:
location_experiment_sorted = []
for i, item in enumerate(unique_experiment):
location = item[0]
exp = item[1]
if location not in location_experiment_sorted:
location_experiment_sorted.append([location, exp])
else:
location_experiment_sorted[i].append(exp)
Amongst other things. I have also tried using a dictionary which relates to a list of multiple experiments. Can anyone point me in the right direction?
Upvotes: 2
Views: 77
Reputation: 10951
Here is another working example, using built-in dict
and groupby
from itertools
:
>>> from itertools import groupby
>>> d = {}
>>> location_exp = [
("location1", "experiment1"),
("location1", "experiment2"),
("location1", "experiment3"),
("location2", "experiment2"),
("location2", "experiment3"),
("location2", "experiment4")
]
>>> for k,v in groupby(location_exp, itemgetter(0)):
d.setdefault(k,[])
d[k].extend([loc for _, loc in v])
[]
[]
>>> d
{'location2': ['experiment2', 'experiment3', 'experiment4'], 'location1': ['experiment1', 'experiment2', 'experiment3']}
>>>
>>> d2 = {}
>>> location_exp2 = [
("location1", "experiment1"),
("location2", "experiment2"),
("location3", "experiment3"),
("location1", "experiment2"),
("location2", "experiment3"),
("location3", "experiment4")
]
>>> for k,v in groupby(location_exp2, itemgetter(0)):
d2.setdefault(k,[])
d2[k].extend([loc for _, loc in v])
[]
[]
[]
['experiment1']
['experiment2']
['experiment3']
>>> d2
{'location2': ['experiment2', 'experiment3'], 'location1': ['experiment1', 'experiment2'], 'location3': ['experiment3', 'experiment4']}
Upvotes: 1
Reputation: 2656
If I do understand you correctly (if locations can be used as dict keys)
you could do:
location_experiments={}
for location, experiment in location_exp:
location_experiments.setdefault(location,[]).append(experiment)
Upvotes: 2
Reputation: 405
Try defaultdict, ie:
from collections import defaultdict
unique_locations = ["location1", "location2"]
location_exp = [
("location1", "experiment1"),
("location1", "experiment2"),
("location1", "experiment3"),
("location2", "experiment2"),
("location2", "experiment3"),
("location2", "experiment4")
]
location_experiment_dict = defaultdict(list)
for location, exp in location_exp:
location_experiment_dict[location].append(exp)
print(location_experiment_dict)
will print-out:
defaultdict(<type 'list'>, {
'location2': ['experiment2', 'experiment3', 'experiment4'],
'location1': ['experiment1', 'experiment2', 'experiment3']
})
Upvotes: 2
Reputation: 641
I haven't run this, so apologies if it fails. if you say it's a list of lists like [ [location, experiment], [location, experiment] ] then:
locationList = {}
for item in unique_experiment:
location = item[0]
exp = item[1]
if location not in locationList:
locationList[location] = []
locationList[location].append(exp)
else:
locationList[location].append(exp)
Upvotes: 1