Reputation: 81
Suppose I have a dictionary of duplicate image IDs:
dict_duplicates = {0: [6], 1: [3], 2: [7], 3: [1], 4: [5], 5: [4], 6: [0], 7: [2]}
Where image 0 has a list of duplicates including image 6. Or, the reverse, where image 6 has a list of duplicates including image 0.
And I have a table that displays the image ID and the date it was created.
How can I create a list of unique images by earliest creation date?
To clarify this is what I was doing:
dups = set()
for key, value in ordered_dict_duplicates.items():
if key not in dups:
dups = dups.union(value)
Output:
{6: [0], 3: [1], 7: [2], 1: [3], 5: [4], 4: [5], 0: [6], 2: [7]}
6
{0}
3
{0, 1}
7
{0, 1, 2}
1
5
{0, 1, 2, 4}
4
0
2
This is where it "breaks".
The problem is that image 3 is the earliest version of the image (9/18). Image 4 is dated (9/22).
Upvotes: 0
Views: 81
Reputation: 505
This is pretty much the whole code that you are looking for. result
returns just {0, 2}
because of the values defined in dict_duplicates
import pandas as pd
from datetime import datetime
dict_duplicates = {0: [6], 1: [3], 2: [7], 3: [1], 4: [5], 5: [4], 6: [0], 7: [2]}
dict = {'img_id': [0, 1, 2, 3, 4, 5, 6, 7], 'date': ["2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-22_02:21:22", "2020-09-22_02:21:22", "2020-09-22_02:21:22", "2020-09-22_02:21:22"]}
df = pd.DataFrame(dict)
result = set()
for key, value in dict_duplicates.items():
date1 = datetime.strptime(df[df["img_id"] == key]["date"].values[0], "%Y-%m-%d_%H:%M:%S")
date2 = datetime.strptime(df[df["img_id"] == value[0]]["date"].values[0], "%Y-%m-%d_%H:%M:%S")
if date1 < date2:
result.add(key)
print(result)
Upvotes: 1