Reputation: 95
I have a list with alterts, sometimes with duplicates in german and english language. I want to remove the duplicates from that list. So I want to say: if for an alert (which I detect as duplicate with the same timestamps for "start" and "end") is an duplicate in the list, remove this whole dataset-list from the alerts-list (which means "description", "event", "start",...): In this case the second list should be deleted:
{
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of frost",
"end": 1613379600,
"event": "frost",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}}
How can I do it in python and save the new alerts-list without duplicates? I think it must be something like this (sorry for pseudo code, I can't transfer the already given examples, I am beginner...) please help! thx a lot!
for item in data['alerts']:
if item['start'] == item['start'] and item['end'] == item['end']
delete
So that I get this output:
{
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}}
Upvotes: 0
Views: 896
Reputation: 6234
You can group all the similar timestamps using itertools.groupby
[Python-docs] and then select the document with English language.
from itertools import groupby data["alerts"] = sorted(data["alerts"], key=lambda x: (x["end"], x["start"])) data["alerts"] = [ g for key, group in groupby(data["alerts"], key=lambda x: (x["end"], x["start"])) for g in group if g["lang"] == "en" # change accordingly ]
Upvotes: 1
Reputation: 14233
Sort the input list by lang in reverse order - en
will come before de
, then make a dict, where key is tuple (start, end)
and use the dict.values()
. Because de
will come after en
if there are alerts with same key start, end, de will update the value for the key.
data = {
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}]}
unique = {(item['start'], item['end']):item for item in
sorted(data['alerts'], key=lambda x: x['lang'], reverse=True)}
data['alerts'] = sorted(unique.values(), key=lambda x: (x['start'], x['end']))
output
{
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}
]
}
not sure if you need result sorted by time, so you can removed that part
Upvotes: 1
Reputation: 1071
You can do the filtering via dictionary comprehension:
data = {
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of frost",
"end": 1613379600,
"event": "frost",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}]}
filtered = {(entry["start"], entry["end"]): entry for entry in reversed(data["alerts"])}
data["alerts"] = list(filtered.values())
This approach utilizes the fact that duplicated dictionary keys are overwritten with the last entry.
Remove the reversed()
if you'd like to keep the last duplicated entry instead of the first one
Upvotes: 2
Reputation: 9047
Try keeping the known timestamps in a list then in upcoming elements, check if it is already visited, then ignore.
data = {
"alerts": [
{
"description": "Es tritt leichter Frost auf.",
"end": 1613379600,
"event": "FROST",
"lang": "de",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of frost",
"end": 1613379600,
"event": "frost",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613322000
},
{
"description": "There is a risk of wind gusts",
"end": 1613408400,
"event": "wind gusts",
"lang": "en",
"sender_name": "DWD / Nationales Warnzentrum Offenbach",
"start": 1613336400
}]}
visited_timestamp = []
output = []
for each_message in data['alerts']:
if (each_message['end'], each_message['start']) in visited_timestamp:
pass # don't do anything
else:
output.append(each_message)
visited_timestamp.append((each_message['end'], each_message['start']))
data['alerts'] = output
print(data)
{
'alerts':
[
{'description': 'Es tritt leichter Frost auf.', 'end': 1613379600, 'event': 'FROST', 'lang': 'de', 'sender_name': 'DWD / Nationales Warnzentrum Offenbach', 'start': 1613322000},
{'description': 'There is a risk of wind gusts', 'end': 1613408400, 'event': 'wind gusts', 'lang': 'en', 'sender_name': 'DWD / Nationales Warnzentrum Offenbach', 'start': 1613336400}
]
}
Upvotes: 0