Reputation: 765
Let's say I have a list, with pipes denoting a hierarchy.
l = ['animals|fish|salmon', 'fish|salmon', 'fish', 'animals', 'furniture',
'animals|big cats|lions', 'animals|birds|fisher bird']
I want to return a list with all the entries that are not redundant:
l = ['animals|fish|salmon', 'animals|big cats|lions',
'animals|birds|fisher bird', 'furniture']
I tried various variations of sorting the list by length and then using the "any" keyword to find elements that were contained within one of these entries.
l2 = paths.sort(key=len)
for i in l2:
if any(i in j for j in l if i != j):
...
But this wasn't really successful. Can anyone recommend a better way of approaching this?
Thanks!
Upvotes: 2
Views: 83
Reputation: 23624
Earlier I asked "what happens if you have say l = ['animals|fish', 'fish|salmon']
?" And I was bored so I've fiddled around with a solution to return all unique chains in full.
now ['animals|fish', 'fish|salmon'] -> ['animals|fish|salmon']
and ['animals|fish|trout', 'fish|salmon'] -> ['animals|fish|salmon','animals|fish|trout']
You can create a Node for each species containing a list of child species, and its parent species. For each animal individual animal in the list you can create a node. For each string in the list, you link the child animal to the parent animal in-front of it.
When you're done you print the hierarchy of all animals that do not have a parent.
class SpeciesNode(object):
def __init__(self, name):
self.parent = None
self.name = name
self.children = []
def add_node(self,s_node):
if not s_node in self.children:
s_node.parent = self
self.children.append(s_node)
def get_branches(self):
if len(self.children) == 0:
yield self.name
else:
for child in self.children:
for branch in child.get_branches():
yield self.name + '|' + branch
Now you can define a function to convert a list of hierarchies into a list of nodes.
def get_s_nodes(animal_list):
s_nodes = {}
for hierarchy in animal_list:
h_list = hierarchy.split('|')
parent = None
for species in h_list:
if not species in s_nodes.keys():
s_nodes[species] = SpeciesNode(species)
if parent is not None:
s_nodes[parent].add_node(s_nodes[species])
parent = species
return s_nodes.values()
Finally convert this back to a list of strings
def get_animal_list(s_nodes):
animal_kingdom = []
for node in s_nodes:
if node.parent is None:
for branch in child.get_branches():
animal_kingdom.append(branch)
return animal_kingdom
so:
>>> l = ['animals|fish|salmon', 'fish|salmon','fish|trout', 'salmon|salmon eggs', 'fish', 'animals',
'furniture', 'animals|big cats|lions', 'animals|birds|fisher bird']
>>> get_animal_list(get_s_nodes(l))
['animals|fish|salmon|salmon eggs', 'animals|fish|trout', 'animals|big cats|lions', 'animals|birds|fisher bird', 'furniture']
>>>
Upvotes: 1
Reputation: 250961
You can do something like this.
Here I am using a set
to keep track of already seen items. I loop over each item, split it at whitespaces first and then at |
, and next step is to check if any of the item in that list is not present in seen
set or not, if yes then store that string in out
list and the items of the list are added to the seen
set.
lis = ['animals|fish|salmon', 'fish|salmon', 'fish', 'animals', 'furniture',
'animals|big cats|lions', 'animals|birds|fisher bird']
seen = set()
out = []
for x in lis:
items = [z for y in x.split() for z in y.split('|')]
if any(y not in seen for y in items):
seen.update(items)
out.append(x)
print out
#['animals|fish|salmon', 'furniture', 'animals|big cats|lions', 'animals|birds|fisher bird']
Upvotes: 2
Reputation: 9953
I think the definition of "redundant" isn't 100% clear here. Your post seems to indicate that if all of the text in one entry appears in another that's redundant, but I don't think that's really what you mean. For example ['fish', 'fishers']
should probably not be reduced to just ['fishers']
even though 'fish' in 'fishers' == True
.
I'm guessing what you want is closer to this:
def is_redundant(a, b):
"""Is a redundant given b?"""
a_parts = set(a.split('|'))
b_parts = set(b.split('|'))
return len(a_parts.intersect(b_parts) == len(b_parts)
Here's a simple (but inefficient) solution:
for item in l:
l = filter(lambda x: not is_redundant(x, item), l)
Upvotes: 0
Reputation: 32429
I am not sure if this is what you are looking for:
l = ['animals|fish|salmon', 'fish|salmon', 'fish', 'animals', 'furniture',
'animals|big cats|lions', 'animals|birds|fisher bird']
def simplify (data):
data = ['|{}|'.format (e) for e in data]
return [e [1:-1] for e in data if all (e is other or e not in other for other in data) ]
print (simplify (l) )
It prints:
['animals|fish|salmon', 'furniture', 'animals|big cats|lions', 'animals|birds|fisher bird']
What I do:
First step: put pipes at the beginning and end of each item '|{}|'.format
(to avoid conflicts with e.g. fish
and fisher bird
.
Second step: Filter the list, throwing away all items which are subpaths of another (e not in other
) except themselves (e is other or
). I also trim again the additional pipes (e [1:-1]
)
Upvotes: 3