Reputation: 1509
I have a list of all US state names.
states = ['Oklahoma', 'Kansas', 'North Carolina', 'Georgia', 'Oregon',
'Mississippi', 'Minnesota', 'Colorado', 'Alabama',
'Massachusetts', 'Arizona', 'Connecticut', 'Montana',
'West Virginia', 'Nebraska', 'New York', 'Nevada', 'Idaho',
'New Jersey', 'Missouri', 'South Carolina', 'Pennsylvania',
'Rhode Island', 'New Mexico', 'Alaska', 'New Hampshire',
'Tennessee', 'Washington', 'Indiana', 'Hawaii', 'Kentucky',
'Virginia', 'Ohio', 'Wisconsin', 'Maryland', 'Florida',
'Utah', 'Maine', 'California', 'Vermont', 'Arkansas', 'Wyoming',
'Louisiana', 'North Dakota', 'South Dakota', 'Texas',
'Illinois', 'Iowa', 'Michigan', 'Delaware']
I want to find the longest string in this list of items, which is easy enough with the following:
def longest_state(data):
return(max(states,key=len))
print(longest_state(states)
This returns "North Carolina", which has a length of 14. However, "South Carolina" is also 14, but is not returned.
I tried to use the following stackoverflow thread which has an example to find multiple longest strings using a list comprehension but I was unable to make it work... Python's most efficient way to choose longest string in list?
I also tried using if/else statements to append the list item to another variable if it equaled the length of the current longest item but was unsuccessful
Can anyone help?
Upvotes: 3
Views: 710
Reputation: 2445
This question got me wondering which of all the possible solutions would have the best performance, so i made a comparison between all of which came to my mind and were not posted already, and compared them to mine.
groupby
approach:sorted_states = sorted(states, key=len, reverse=True)
grouped_states = next(groupby(sorted_states, key=len))
list(grouped_states[1])
groupby
needs a sorted list to work properly, so there is the "overhead" of sorting the list beforehand, but most Python interpreters have heavily optimized sorting algorhythms. We stop the generator at the first group occurrence with next
, so it does not continue fetching the remaining items.
takewhile
approach:sorted_states = sorted(states, key=len, reverse=True)
max_length = len(sorted_states[0])
list(takewhile(lambda x: max_length == len(x), sorted_states))
This also needs a sorted list, as well the length of the first item, but it stops the gathering of the new list as soon as the lambda expectation is not met anymore.
reduce
approach:def _keep_longest(a, v):
if len(a) == 0 or len(v) >= len(a[-1]):
a.append(v)
return a
sorted_states = sorted(states, key=len, reverse=True)
reduce(_keep_longest, sorted_states, [])
This one needs a method to handle the comparison between the previous length as well a sorted list. All of its length comparisons and moving list from lambda to lambda make this a non-efficient method.
I included the other answers (The max
and len
from various posters, as well @Spencer Bard's, @Wim's and the other list comprehensions doing the len on the max scan over each comparison) in the tests as well to compare their performance
Of course the results vary a lot but doing them over and over again (sample size 50_000 on repl.it) i can say that they are representative (let them also run a few times on my cPython 3.5):
max and len 50_000 times: 1.3888958770003228
sort and groupby 50_000 times: 1.405984859000455
sort and takewhile 50_000 times: 1.4154430249991492
spencer 50_000 times: 1.607105290000618
wim 50_000 times: 1.9011182049998752
sort and reduce 50_000 times: 4.612968634999561
comprehension 50_000 times: 27.802522705999763
The max
and len
approach posted multiple times here takes the cake, and is probably the most pythonic way as it is self-explanatory without resorting to list sorting or the usage of the itertools
, functools
or collections
libraries.
Upvotes: 1
Reputation: 1
Another potential solution. Pretty short and sweet
def longest_state(data):
return [state for state in data if len(state) == len(max(data, key=len))]
Upvotes: -1
Reputation: 4606
longest_state = max(states, key=len)
for i in states:
if len(i) == len(longest_state):
print(i)
Alternate format
longest_state = max(states, key=len)
[[print(i)] for i in states if len(i) == len(longest_state)]
Upvotes: 0
Reputation: 51
Hope this helps. Two pass approach, may not be the best. But certainly is O(n).
def longest_state(states):
max_len = len(max(states, key=len))
return [state for state in states if len(state) == max_len]
1 pass would be best, but this looks shorter.
Upvotes: 2
Reputation: 2244
s = len(max(states, key=len))
[i for i in states if len(i) == s]
Upvotes: 1
Reputation: 362717
Key a dict off of the lengths:
>>> from collections import defaultdict
>>> len2states = defaultdict(list)
>>> for state in states:
... len2states[len(state)].append(state)
...
>>> len2states[max(len2states)]
['North Carolina', 'South Carolina']
Upvotes: 1
Reputation: 1035
You could store all of the longest names in an array
def longest_state(data):
cur_longest = []
cur_longest_num = 0
for state in data:
if len(state) == cur_longest_num:
cur_longest.append(state)
elif len(state) > cur_longest_num:
cur_longest = [state]
cur_longest_num = len(state)
return cur_longest
Upvotes: 3