JD2775
JD2775

Reputation: 3797

Returning max of string after comparison with other sub-strings - Python

I have a list that looks like this:

json_file_list = ['349148424_20180312071059_20190402142033.json','349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']

and a empty list:

list2 = []

What I want to do is compare the characters up until the second underscore '_', and if they are the same I only want to append the max of the full string, to the new list. In the case above, the first 2 entries are duplicates (until second underscore) so I want to base the max off the numbers after the second underscore. So the final list2 would have only 2 entries and not 3

I tried this:

for row in json_file_list:
    if row[:24] == row[:24]:
        list2.append(max(row))
    else:
        list2.append(row)

but that is just returning:

['s', 's', 's']

Final output should be:

['349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']

Any ideas? I also realize this code is brittle with the way I am slicing it (what happens if the string gets longer/shorter) so I need to come up with a better way to do that. Maybe base if off the second underscore instead. The strings will always end with '.json'

Upvotes: 0

Views: 44

Answers (3)

Jason Miller
Jason Miller

Reputation: 33

I suppose you can splice what you want to compare, and use the built in 'set', to perform your difference:

set([x[:24] for x in json_file_list])
set(['360758678_20180529121334', '349148424_20180312071059'])

It would be a simple matter of joining the remaining text later on

list2=[]
for unique in set([x[:24] for x in json_file_list]):
  list2.append(unique + json_file_list[0][24:])

list2
['360758678_20180529121334_20190402142033.json',
 '349148424_20180312071059_20190402142033.json']

Upvotes: 1

Lante Dellarovere
Lante Dellarovere

Reputation: 1858

I'd use a dictionary to do this:

from collections import defaultdict

d = defaultdict(list)
for x in json_file_list:
    d[tuple(x.split("_")[:2])].append(x)


new_list = [max(x) for x in d.values()]
new_list

Output:

['349148424_20180312071059_20190405142033.json',
 '360758678_20180529121334_20190402142033.json']

Upvotes: 1

Alec
Alec

Reputation: 9536

The if statement in this snippet:

for row in json_file_list:
    if row[:24] == row[:24]:
        list2.append(max(row))
    else:
        list2.append(row)

always resolves to True. Think about it, how could row[:24] be different from itself? Given that it's resolving to True, it's adding the farthest letter in the alphabet (and in your string), s in this case, to list2. That's why you're getting an output of ['s', 's', 's'].

Maybe I'm understanding your request incorrectly, but couldn't you just append all the elements of the row to a list and then remove duplicates?

for row in json_file_list:
    for elem in row:
        list2.append(elem)
list2 = sorted(list(set(list2)))

Upvotes: 1

Related Questions