Dicky Raambo
Dicky Raambo

Reputation: 523

Remove duplicate in datalist Python 2.7/Django

example, I have a list called attendances that contain multiple data like:

[ <Attendance>: 11804 : 2018-07-18 12:22:55, <Attendance>: 11804 : 2018-07-18 12:23:04, <Attendance>: 2 : 2018-07-25 16:17:18, <Attendance>: 2 : 2018-07-25 16:17:20, <Attendance>: 2 : 2018-07-25 16:17:23, <Attendance>: 2 : 2018-07-25 16:27:52]

when I need to print it. I do simply:

for data in attendances:
    print 'User ID   : {}'.format(data.user_id)
    print 'Timestamp : {}'.format(data.timestamp) 

result will be:

User ID   : 11804
Timestamp : 2018-07-18 12:22:55
User ID   : 11804
Timestamp : 2018-07-18 12:23:04
User ID   : 2
Timestamp : 2018-07-25 16:17:18
User ID   : 2
Timestamp : 2018-07-25 16:17:20
User ID   : 2
Timestamp : 2018-07-25 16:17:23
User ID   : 2
Timestamp : 2018-07-25 16:27:52

but that not what I need, since its print all the data. I need to only show only one and first data every User ID.

like this :

User ID   : 11804
Timestamp : 2018-07-18 12:22:55
User ID   : 2
Timestamp : 2018-07-25 16:17:18

any have idea what should I do?...

Upvotes: 1

Views: 45

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477200

With a query

You can make a query such that you obtain a QuerySet containing dictionaries. In that case every dictionary contains a 'user_id' key, and a 'first_timestamp' key, like:

from django.db.models import Min

data =Attendance.objects.values('user_id').annotate(
    first_timestamp=Min('timestamp')
).order_by('user_id')

You can then enumerate the result, and print it like:

for data in attendances:
    print 'User ID   : {}'.format(data['user_id'])
    print 'Timestamp : {}'.format(data['timestamp']) 

With a set that maintains the already seen users

In case it is not possible to write such query (you are given a list for example). We can perform a sorting first, and then maintain a set of already seen user ids:

from operator import attrgetter

sorted_attendances = sorted(attendances, key=attrgetter('timestamp'))
seen_users = set()

for attendance in sorted_attendances:
    if attendance.user_id not in seen_users:
        seen_users.add(attendance.user_id)
        print 'User ID   : {}'.format(data.user_id)
        print 'Timestamp : {}'.format(data.timestamp)

This approach is typically more expensive however, since the amount of data transferred by the database is larger, and thus is the amount of data to process.

Upvotes: 2

Related Questions