Ananta
Ananta

Reputation: 3711

How do I make a sorted list of items based on multiple list of sorted items?

I am trying to get a sorted master list from smaller lists. This can be in python or R.

in R, I have

l1<-c("a","c","d")
l2<-c("a","b","e")
l3<-c("a","c","e")
l4<-c("a","b","c","e")
l5<-c("b","c","d")

m<-unique(c(l1,l2,l3,l4,l5))

The output I was expecting is a,b,c,d,e.

In python

l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]

expected result is ["a","b","c","d","e"] I started with making a set and going over each list and checking with index but it started getting complex.

Thank you for help.

EDIT: I think I confused you with having a alphabetically sorted list. Those list items could be random. the real order may not be a,b,c,d,e.

l1=["e","a"]
l2=["e","b","d"]
l3=["b","d","a"]

in this case expected order is ["e","b","d","a"] not ["a","b","d","e"]

for more clarity, think about few people trying to name US states from east to west.

person 1 says, Florida, Louisiana, Nevada,California. 
person 2 says  Alabama, Mississippi, Louisiana, new Mexico, Nevada 
person 3 says Florida, Alabama, Texas, New Mexico, California
person 4 says Alabama, Mississippi, Texas, Nevada
person 5 says Mississippi Louisiana, Nevada

and I am trying to get right order from information above.

So, here, We will start with Florida, Louisiana, Nevada, California. Now, adding second, it would be, (Alabama, Florida),Louisiana,New Mexico, Nevada, California. Adding 3rd (breaks the tie of Alabama/Florida),Florida, Alabama, Louisiana, Texas, New Mexico, Nevada, California and adding 4th makes Florida, Alabama, (Mississippi/Louisiana), Texas, New Mexico, Nevada, California. adding 5th breaks tie of Mississippi and Louisiana.

Upvotes: 1

Views: 77

Answers (4)

Calum You
Calum You

Reputation: 15072

Here is an approach in R that turns the vectors into a directed acyclic graph using tidygraph, and then uses node_topo_order to derive the implied node order. Using your example states from east to west:

l1 <- c("Florida", "Louisiana", "Nevada", "California")
l2 <- c("Alabama", "Mississippi", "Louisiana", "New Mexico", "Nevada" )
l3 <- c("Florida", "Alabama", "Texas", "New Mexico", "California")
l4 <- c("Alabama", "Mississippi", "Texas", "Nevada")
l5 <- c("Mississippi", "Louisiana", "Nevada")

library(tidyverse)
library(tidygraph)
ew_graph <- list(l1, l2, l3, l4, l5) %>%
  map_dfr(~tibble(east = ., west = lead(.))) %>% # turn vectors into edge table
  filter(!is.na(west)) %>%
  as_tbl_graph()

ew_graph %>%  # Now we can order nodes and extract their names as output
  arrange(node_topo_order()) %>%
  pull(name)
#> [1] "Florida"     "Alabama"     "Mississippi" "Louisiana"   "Texas"      
#> [6] "New Mexico"  "Nevada"      "California"

Note that there can be more than one correct order and this will only return one of them. If we want, we can also plot the graph to see the relationships more clearly, which reveals that in this data we have a tie between Louisiana and Texas (you can't trace from one to the other), which you missed in constructing the example. It just happens that we have them in the "true" order. If you need to define a separate way of breaking ties, this approach will require some hacking.

library(ggraph)
ggraph(ew_graph) +
  geom_node_label(aes(label = name)) +
  geom_edge_link(
    mapping = aes(start_cap = label_rect(node1.name),
                  end_cap = label_rect(node2.name)),
    arrow = arrow(length = unit(4, 'mm'))
  )

Created on 2019-05-28 by the reprex package (v0.3.0)

Upvotes: 1

yzi0148
yzi0148

Reputation: 45

For Python:

# Create list of lists
lsts = [l1, l2, l3, l4, l5]
s = set()

# Add lists to set
for lst in lsts:
  s.update(lst)

# Sort set
sorted(s)

Edit After OP's update:

def sort_lists(lsts):
  list_of_hashes = []
  for lst in lsts:
    list_of_hashes.append({k: v for v, k in enumerate(lst)})

  result_hash = dict()
  for hash_item in list_of_hashes:
    for key, value in hash_item.items():
      if result_hash.get(key):
        result_hash[key] += value
      else:
        result_hash[key] = value
  print(result_hash)
  sorted_results = sorted(result_hash.items(), key=lambda kv: kv[1])
  print(sorted_results)
  return [tup[0] for tup in sorted_results]
# Test Case 1
l1=["e","a"]
l2=["e","b","d"]
l3=["b","d","a"]

print(sort_lists([l1,l2,l3]))
>> ['e', 'd', 'b', 'a']
# Test Case 2
s1 = ['Florida', 'Louisiana', 'Nevada', 'California']
s2 = ['Alabama', 'Mississippi', 'Louisiana', 'New Mexico', 'Nevada']
s3 = ['Florida', 'Alabama', 'Texas', 'New Mexico', 'California']
s4 = ['Alabama', 'Mississippi', 'Texas', 'Nevada']
s5 = ['Mississippi', 'Louisiana', 'Nevada']
print(sort_lists([s1,s2,s3,s4,s5]))
>> ['Florida', 'Alabama', 'Mississippi', 'Louisiana', 'Texas', 'New Mexico', 'California', 'Nevada']

Upvotes: 2

My solution has complexity O(n). Other solutions can have O(n log n):

Python: (For R it is similar)

l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]

lsts = [l1, l2, l3, l4, l5]

solve = []
for p in range(130):
    solve.append(0)

for lst in lsts:
    for p in lst:
        solve[ord(p)] += 1

for idx, value in enumerate(solve):
    if value != 0:
        print chr(idx)

This solution is based in values from table ascii.

For your update:

l1=["z","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]

mySet = set()

mySet.update(l1, l2, l3, l4, l5)

result = sorted(mySet)
print(result)

Upvotes: 0

Mike
Mike

Reputation: 858

Aaaand the actual answer: https://www.python.org/doc/essays/graphs/

Good hunting! :D

This should do it for your original question:

l1=["a","c","d"]
l2=["a","b","e"]
l3=["a","c","e"]
l4=["a","b","c","e"]
l5=["b","c","d"]

s = set()
s.update(l1, l2, l3, l4, l5)
l = sorted(s)
l
#['a', 'b', 'c', 'd', 'e']

For your edited question, let's consider a slight variation on your second example:

l1=["e","a"]
l2=["e","b","d"]
l3=["b","c","a"]

(Squint at l3.) In this case, the set of lists is underdetermined, as there is no unique ordering between d and c. Without a rule for deciding ties, no algorithm is possible.

Upvotes: 3

Related Questions