Avadhesh
Avadhesh

Reputation: 4703

pythonic way to optimize the logic to filter/extract data from list

I have a list like below:

['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

From this list i want Three different list as a result. I want result using single iteration on list.

  1. list of all uids i.e [3234,3235,3236,3237,3241 .....]
  2. list of Seen uids i.e [3234,3235 ...] <-- uid of item which has \Seen Flag
  3. list of deleted uids i.e [3236,3253] <-- uid of item which has \Deleted Flag

Upvotes: 3

Views: 257

Answers (7)

ChessMaster
ChessMaster

Reputation: 549

This one works for your data sample....

uids, seen, deleted = [], [], []
for item in myList:
    uids.append(int(item[7:12]))
    if 'Se' in item[20:]:  seen.append(uids[-1])
    elif 'De' in item[20:]: deleted.append(uids[-1])

Upvotes: 1

Fbo
Fbo

Reputation: 16737

import re

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)')
uid_list = []
seen_uid_list = []
deleted_uid_list = []
for s in data:
    m = r.match(s)
    if m:
        uid_list.append(m.group('uid'))
        if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid'))
        if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid'))

print uid_list
print seen_uid_list
print deleted_uid_list

Upvotes: 2

salezica
salezica

Reputation: 76909

The only way I can think of of doing it in one iteration generating the three lists you ask, is by iterating manually. No python magic I can come up with.

You can easily improve this if you know specifics about the format and how it's generated. I don't know why +FLAGS and -FLAGS in some items, for example, and didn't know when to expect parenthesis, so I had to use find(). Also, I could've just split() the string in two, but then again, I don't know what the flag format means,...

def parseList(l):
    lall = []
    lseen = []
    ldeleted = []

    for item in l:
        spl = item.split()

        uid = int(spl[2])

        lall.append(uid)

        for word in spl[4:]:
            if word.find("\Seen") != -1:
                lseen.append(uid)

            elif word.find("\Deleted") != -1:
                ldeleted.append(uid)

    return lall, lseen, ldeleted

Upvotes: 0

ghostdog74
ghostdog74

Reputation: 342323

all=[]
seen=[]
deleted=[]
for item in alist:
    s=item.split(" ",4)
    all.append(s[2])
    if "seen" in s[-1].lower():
        seen.append(s[2])
    elif "delete" in s[-1].lower():
        deleted.append(s[2])

Upvotes: 0

shahjapan
shahjapan

Reputation: 14335

all,deleted,seen = [list(filter(None, a)) for a in \
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in  a[-1] and a[2]), map(lambda a: a.split(' '), items)))]

which will be faster using re or without re - you need to check with timeit !!!

Upvotes: 1

David Webb
David Webb

Reputation: 193686

The best thing to do would be to turn your data into a dict mapping UID to FLAGS, then searching it will be easy. So the data will look something like this:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'}

You can do this using a Regular Expression to match each entry in the list. If we get the regexp to return two groups in the match we can easily build the dict.

So we end up with something like this:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))',  '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))',  '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', '22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

import re
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)")
values = dict(pattern.match(item).groups() for item in items)

We can then easily query the items in values to get what you want:

print "All UIDs:",values.keys()
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags]
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags]

Upvotes: 3

Noufal Ibrahim
Noufal Ibrahim

Reputation: 72745

I'm not sure about list comprehensions since those usually map one list to another (using either filtering or mapping). I've not seen them being used to split lists. However, you could do this with a combination of a genexp and a loop in a single iteration. I've blown this up a little so that it's clear.

import re
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)')

t = [..] #your list

items = (grepper.search(m).groupdict() for m in t)

all = []
seen = []
deleted = []
for i in items:
  if "Seen" in i:
    seen.append(i["uid"])
  if "Deleted" in i:
    deleted.append(i["uid"])
  all.append(i["uid"])

You should have your 3 lists now.

Upvotes: 1

Related Questions