Reputation: 3797
I have a list that looks like this:
json_file_list = ['349148424_20180312071059_20190402142033.json','349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']
and a empty list:
list2 = []
What I want to do is compare the characters up until the second underscore '_', and if they are the same I only want to append the max of the full string, to the new list. In the case above, the first 2 entries are duplicates (until second underscore) so I want to base the max off the numbers after the second underscore. So the final list2 would have only 2 entries and not 3
I tried this:
for row in json_file_list:
if row[:24] == row[:24]:
list2.append(max(row))
else:
list2.append(row)
but that is just returning:
['s', 's', 's']
Final output should be:
['349148424_20180312071059_20190405142033.json','360758678_20180529121334_20190402142033.json']
Any ideas? I also realize this code is brittle with the way I am slicing it (what happens if the string gets longer/shorter) so I need to come up with a better way to do that. Maybe base if off the second underscore instead. The strings will always end with '.json'
Upvotes: 0
Views: 44
Reputation: 33
I suppose you can splice what you want to compare, and use the built in 'set', to perform your difference:
set([x[:24] for x in json_file_list])
set(['360758678_20180529121334', '349148424_20180312071059'])
It would be a simple matter of joining the remaining text later on
list2=[]
for unique in set([x[:24] for x in json_file_list]):
list2.append(unique + json_file_list[0][24:])
list2
['360758678_20180529121334_20190402142033.json',
'349148424_20180312071059_20190402142033.json']
Upvotes: 1
Reputation: 1858
I'd use a dictionary to do this:
from collections import defaultdict
d = defaultdict(list)
for x in json_file_list:
d[tuple(x.split("_")[:2])].append(x)
new_list = [max(x) for x in d.values()]
new_list
Output:
['349148424_20180312071059_20190405142033.json',
'360758678_20180529121334_20190402142033.json']
Upvotes: 1
Reputation: 9536
The if statement in this snippet:
for row in json_file_list:
if row[:24] == row[:24]:
list2.append(max(row))
else:
list2.append(row)
always resolves to True
. Think about it, how could row[:24]
be different from itself? Given that it's resolving to True
, it's adding the farthest letter in the alphabet (and in your string), s
in this case, to list2
. That's why you're getting an output of ['s', 's', 's']
.
Maybe I'm understanding your request incorrectly, but couldn't you just append all the elements of the row to a list and then remove duplicates?
for row in json_file_list:
for elem in row:
list2.append(elem)
list2 = sorted(list(set(list2)))
Upvotes: 1