sar
sar

Reputation: 81

Create a list of unique image IDs based on duplicate dictionary and creation date

Suppose I have a dictionary of duplicate image IDs:

dict_duplicates = {0: [6], 1: [3], 2: [7], 3: [1], 4: [5], 5: [4], 6: [0], 7: [2]}

Where image 0 has a list of duplicates including image 6. Or, the reverse, where image 6 has a list of duplicates including image 0.

And I have a table that displays the image ID and the date it was created.

Image IDs and Creation Date

How can I create a list of unique images by earliest creation date?

To clarify this is what I was doing:

dups = set() 
for key, value in ordered_dict_duplicates.items():
        if key not in dups:
            dups = dups.union(value)

Output:

{6: [0], 3: [1], 7: [2], 1: [3], 5: [4], 4: [5], 0: [6], 2: [7]}
6
{0}
3
{0, 1}
7
{0, 1, 2}
1
5
{0, 1, 2, 4}
4
0
2

This is where it "breaks".

The problem is that image 3 is the earliest version of the image (9/18). Image 4 is dated (9/22).

Upvotes: 0

Views: 81

Answers (1)

msimons
msimons

Reputation: 505

This is pretty much the whole code that you are looking for. result returns just {0, 2} because of the values defined in dict_duplicates

import pandas as pd
from datetime import datetime

dict_duplicates = {0: [6], 1: [3], 2: [7], 3: [1], 4: [5], 5: [4], 6: [0], 7: [2]}

dict = {'img_id': [0, 1, 2, 3, 4, 5, 6, 7], 'date': ["2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-18_23:03:03", "2020-09-22_02:21:22", "2020-09-22_02:21:22", "2020-09-22_02:21:22", "2020-09-22_02:21:22"]}
df = pd.DataFrame(dict)

result = set()

for key, value in dict_duplicates.items():
    date1 = datetime.strptime(df[df["img_id"] == key]["date"].values[0], "%Y-%m-%d_%H:%M:%S")
    date2 = datetime.strptime(df[df["img_id"] == value[0]]["date"].values[0], "%Y-%m-%d_%H:%M:%S")
    if date1 < date2:
        result.add(key)

print(result)

Upvotes: 1

Related Questions