Reputation: 1509

More than 1 longest string in list?

I have a list of all US state names.

states = ['Oklahoma', 'Kansas', 'North Carolina', 'Georgia', 'Oregon',
      'Mississippi', 'Minnesota', 'Colorado', 'Alabama',
      'Massachusetts', 'Arizona', 'Connecticut', 'Montana',
      'West Virginia', 'Nebraska', 'New York', 'Nevada', 'Idaho',
      'New Jersey', 'Missouri', 'South Carolina', 'Pennsylvania',
      'Rhode Island', 'New Mexico', 'Alaska', 'New Hampshire',
      'Tennessee', 'Washington', 'Indiana', 'Hawaii', 'Kentucky',
      'Virginia', 'Ohio', 'Wisconsin', 'Maryland', 'Florida',
      'Utah', 'Maine', 'California', 'Vermont', 'Arkansas', 'Wyoming',
      'Louisiana', 'North Dakota', 'South Dakota', 'Texas',
      'Illinois', 'Iowa', 'Michigan', 'Delaware']

I want to find the longest string in this list of items, which is easy enough with the following:

def longest_state(data):
    return(max(states,key=len))
print(longest_state(states)

This returns "North Carolina", which has a length of 14. However, "South Carolina" is also 14, but is not returned.

I tried to use the following stackoverflow thread which has an example to find multiple longest strings using a list comprehension but I was unable to make it work... Python's most efficient way to choose longest string in list?

I also tried using if/else statements to append the list item to another variable if it equaled the length of the current longest item but was unsuccessful

Can anyone help?

Upvotes: 3

Answers (7)

wiesion

Reputation: 2445

This question got me wondering which of all the possible solutions would have the best performance, so i made a comparison between all of which came to my mind and were not posted already, and compared them to mine.

The `groupby` approach:

sorted_states = sorted(states, key=len, reverse=True)
grouped_states = next(groupby(sorted_states, key=len))
list(grouped_states[1])

groupby needs a sorted list to work properly, so there is the "overhead" of sorting the list beforehand, but most Python interpreters have heavily optimized sorting algorhythms. We stop the generator at the first group occurrence with next, so it does not continue fetching the remaining items.

The `takewhile` approach:

sorted_states = sorted(states, key=len, reverse=True)
max_length = len(sorted_states[0])
list(takewhile(lambda x: max_length == len(x), sorted_states))

This also needs a sorted list, as well the length of the first item, but it stops the gathering of the new list as soon as the lambda expectation is not met anymore.

The `reduce` approach:

def _keep_longest(a, v):
  if len(a) == 0 or len(v) >= len(a[-1]):
    a.append(v)
  return a

sorted_states = sorted(states, key=len, reverse=True)
reduce(_keep_longest, sorted_states, [])

This one needs a method to handle the comparison between the previous length as well a sorted list. All of its length comparisons and moving list from lambda to lambda make this a non-efficient method.

Results

Of course the results vary a lot but doing them over and over again (sample size 50_000 on repl.it) i can say that they are representative (let them also run a few times on my cPython 3.5):

max and len 50_000 times: 1.3888958770003228
sort and groupby 50_000 times: 1.405984859000455
sort and takewhile 50_000 times: 1.4154430249991492
spencer 50_000 times: 1.607105290000618
wim 50_000 times: 1.9011182049998752
sort and reduce 50_000 times: 4.612968634999561
comprehension 50_000 times: 27.802522705999763

Conclusion

The max and len approach posted multiple times here takes the cake, and is probably the most pythonic way as it is self-explanatory without resorting to list sorting or the usage of the itertools, functools or collections libraries.

Online demo here

Upvotes: 1

Murman

Reputation: 1

Another potential solution. Pretty short and sweet

def longest_state(data):
    return [state for state in data if len(state) == len(max(data, key=len))]

Upvotes: -1

vash_the_stampede

Reputation: 4606

longest_state = max(states, key=len)

for i in states:
    if len(i) == len(longest_state):
        print(i)

Alternate format

longest_state = max(states, key=len)

[[print(i)] for i in states if len(i) == len(longest_state)]

Upvotes: 0

Ajinkya Chalke

Reputation: 51

Hope this helps. Two pass approach, may not be the best. But certainly is O(n).

def longest_state(states):
    max_len = len(max(states, key=len))
    return [state for state in states if len(state) == max_len]

1 pass would be best, but this looks shorter.

Upvotes: 2

Jonas Wolff

Reputation: 2244

s = len(max(states, key=len))
[i for i in states if len(i) == s]

Upvotes: 1

wim

Reputation: 362717

Key a dict off of the lengths:

>>> from collections import defaultdict
>>> len2states = defaultdict(list)
>>> for state in states:
...     len2states[len(state)].append(state)
...     
>>> len2states[max(len2states)]
['North Carolina', 'South Carolina']

Upvotes: 1

Spencer Bard

Reputation: 1035

You could store all of the longest names in an array

def longest_state(data):
    cur_longest = []
    cur_longest_num = 0
    for state in data:
        if len(state) == cur_longest_num:
            cur_longest.append(state)
        elif len(state) > cur_longest_num:
            cur_longest = [state]
            cur_longest_num = len(state)
    return cur_longest

Upvotes: 3

More than 1 longest string in list?

Answers (7)

The `groupby` approach:

The `takewhile` approach:

The `reduce` approach:

Other answers from this question

Results

Conclusion

Related Questions

More than 1 longest string in list?

Answers (7)

The groupby approach:

The takewhile approach:

The reduce approach:

Other answers from this question

Results

Conclusion

Related Questions

The `groupby` approach:

The `takewhile` approach:

The `reduce` approach: