Erich Purpur
Erich Purpur

Reputation: 1509

More than 1 longest string in list?

I have a list of all US state names.

states = ['Oklahoma', 'Kansas', 'North Carolina', 'Georgia', 'Oregon',
      'Mississippi', 'Minnesota', 'Colorado', 'Alabama',
      'Massachusetts', 'Arizona', 'Connecticut', 'Montana',
      'West Virginia', 'Nebraska', 'New York', 'Nevada', 'Idaho',
      'New Jersey', 'Missouri', 'South Carolina', 'Pennsylvania',
      'Rhode Island', 'New Mexico', 'Alaska', 'New Hampshire',
      'Tennessee', 'Washington', 'Indiana', 'Hawaii', 'Kentucky',
      'Virginia', 'Ohio', 'Wisconsin', 'Maryland', 'Florida',
      'Utah', 'Maine', 'California', 'Vermont', 'Arkansas', 'Wyoming',
      'Louisiana', 'North Dakota', 'South Dakota', 'Texas',
      'Illinois', 'Iowa', 'Michigan', 'Delaware']

I want to find the longest string in this list of items, which is easy enough with the following:

def longest_state(data):
    return(max(states,key=len))
print(longest_state(states)

This returns "North Carolina", which has a length of 14. However, "South Carolina" is also 14, but is not returned.

I tried to use the following stackoverflow thread which has an example to find multiple longest strings using a list comprehension but I was unable to make it work... Python's most efficient way to choose longest string in list?

I also tried using if/else statements to append the list item to another variable if it equaled the length of the current longest item but was unsuccessful

Can anyone help?

Upvotes: 3

Views: 710

Answers (7)

wiesion
wiesion

Reputation: 2445

This question got me wondering which of all the possible solutions would have the best performance, so i made a comparison between all of which came to my mind and were not posted already, and compared them to mine.

The groupby approach:

sorted_states = sorted(states, key=len, reverse=True)
grouped_states = next(groupby(sorted_states, key=len))
list(grouped_states[1])

groupby needs a sorted list to work properly, so there is the "overhead" of sorting the list beforehand, but most Python interpreters have heavily optimized sorting algorhythms. We stop the generator at the first group occurrence with next, so it does not continue fetching the remaining items.

The takewhile approach:

sorted_states = sorted(states, key=len, reverse=True)
max_length = len(sorted_states[0])
list(takewhile(lambda x: max_length == len(x), sorted_states))

This also needs a sorted list, as well the length of the first item, but it stops the gathering of the new list as soon as the lambda expectation is not met anymore.

The reduce approach:

def _keep_longest(a, v):
  if len(a) == 0 or len(v) >= len(a[-1]):
    a.append(v)
  return a

sorted_states = sorted(states, key=len, reverse=True)
reduce(_keep_longest, sorted_states, [])

This one needs a method to handle the comparison between the previous length as well a sorted list. All of its length comparisons and moving list from lambda to lambda make this a non-efficient method.

Other answers from this question

I included the other answers (The max and len from various posters, as well @Spencer Bard's, @Wim's and the other list comprehensions doing the len on the max scan over each comparison) in the tests as well to compare their performance

Results

Of course the results vary a lot but doing them over and over again (sample size 50_000 on repl.it) i can say that they are representative (let them also run a few times on my cPython 3.5):

max and len 50_000 times: 1.3888958770003228
sort and groupby 50_000 times: 1.405984859000455
sort and takewhile 50_000 times: 1.4154430249991492
spencer 50_000 times: 1.607105290000618
wim 50_000 times: 1.9011182049998752
sort and reduce 50_000 times: 4.612968634999561
comprehension 50_000 times: 27.802522705999763

Conclusion

The max and len approach posted multiple times here takes the cake, and is probably the most pythonic way as it is self-explanatory without resorting to list sorting or the usage of the itertools, functools or collections libraries.

Online demo here

Upvotes: 1

Murman
Murman

Reputation: 1

Another potential solution. Pretty short and sweet

def longest_state(data):
    return [state for state in data if len(state) == len(max(data, key=len))]

Upvotes: -1

vash_the_stampede
vash_the_stampede

Reputation: 4606

longest_state = max(states, key=len)

for i in states:
    if len(i) == len(longest_state):
        print(i)

Alternate format

longest_state = max(states, key=len)

[[print(i)] for i in states if len(i) == len(longest_state)]

Upvotes: 0

Ajinkya Chalke
Ajinkya Chalke

Reputation: 51

Hope this helps. Two pass approach, may not be the best. But certainly is O(n).

def longest_state(states):
    max_len = len(max(states, key=len))
    return [state for state in states if len(state) == max_len]

1 pass would be best, but this looks shorter.

Upvotes: 2

Jonas Wolff
Jonas Wolff

Reputation: 2244

s = len(max(states, key=len))
[i for i in states if len(i) == s]

Upvotes: 1

wim
wim

Reputation: 362717

Key a dict off of the lengths:

>>> from collections import defaultdict
>>> len2states = defaultdict(list)
>>> for state in states:
...     len2states[len(state)].append(state)
...     
>>> len2states[max(len2states)]
['North Carolina', 'South Carolina']

Upvotes: 1

Spencer Bard
Spencer Bard

Reputation: 1035

You could store all of the longest names in an array

def longest_state(data):
    cur_longest = []
    cur_longest_num = 0
    for state in data:
        if len(state) == cur_longest_num:
            cur_longest.append(state)
        elif len(state) > cur_longest_num:
            cur_longest = [state]
            cur_longest_num = len(state)
    return cur_longest

Upvotes: 3

Related Questions