Reputation: 7410
I have a list of list in python as follows:
[['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.59', '2010113820', 'amazon.com', '208.67.222.222'],
['1490026791.57', '2010113820', 'amazon.com', '8.8.4.4'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.37', '150612899', 'google.com', '208.67.220.220'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490026791.33', '150612899', 'google.com', '8.8.4.4'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.57', '2017032001', 'intuit.com', '208.67.220.220'],
['1490026791.47', '2017032001', 'intuit.com', '208.67.222.222'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
Column 1: epoch_time
Column 2: serial_number
Column 3: domain
Column 4: server
How would I iterate through the list of list for each domain, so that if the serial_number is equal to the serial_number for 8.8.8.8, the list is deleted so the final output is as follows:
['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
Upvotes: 1
Views: 83
Reputation: 3787
You can get all the serial_number associated with your server (8.8.8.8) and then ignore them while forming your list with an if condition!
data=[['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.59', '2010113820', 'amazon.com', '208.67.222.222'],
['1490026791.57', '2010113820', 'amazon.com', '8.8.4.4'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.37', '150612899', 'google.com', '208.67.220.220'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490026791.33', '150612899', 'google.com', '8.8.4.4'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.57', '2017032001', 'intuit.com', '208.67.220.220'],
['1490026791.47', '2017032001', 'intuit.com', '208.67.222.222'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
serv='8.8.8.8'
fil=filter(None,map(lambda x: x[1] if x[3]==serv else None, data))
print [i for i in data if i[1] not in fil or i[3] == serv]
Output:
[['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'], ['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'], ['1490026791.35', '150612898', 'google.com', '208.67.222.222'], ['1490019411.19', '150612899', 'google.com', '8.8.8.8'], ['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'], ['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
If you time the solution, On using list comprehension (few other solutions),
7.9870223999e-05
On using lambda and map
4.81605529785e-05
This should be a problem in this case, but when the data set is large, time does matter. Hope it helps!
Upvotes: 1
Reputation: 10951
Just create your filter list, then apply the filtering on a list comprehension:
>>> l = [['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.59', '2010113820', 'amazon.com', '208.67.222.222'],
['1490026791.57', '2010113820', 'amazon.com', '8.8.4.4'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.37', '150612899', 'google.com', '208.67.220.220'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490026791.33', '150612899', 'google.com', '8.8.4.4'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.57', '2017032001', 'intuit.com', '208.67.220.220'],
['1490026791.47', '2017032001', 'intuit.com', '208.67.222.222'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
>>>
>>> ip_check = '8.8.8.8'
>>> filter_serials = [lst[1] for lst in l if lst[3] == ip_check]
>>> filter_serials
['2010113820', '150612899', '2017032001']
>>>
>>> output_list = [lst for lst in l if lst[3] == ip_check or lst[1] not in filter_serials]
>>>
>>> for lst in output_list:
print(lst)
['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220']
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8']
['1490026791.35', '150612898', 'google.com', '208.67.222.222']
['1490019411.19', '150612899', 'google.com', '8.8.8.8']
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4']
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']
Upvotes: 1
Reputation: 27869
This should do it:
a = [['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.59', '2010113820', 'amazon.com', '208.67.222.222'],
['1490026791.57', '2010113820', 'amazon.com', '8.8.4.4'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.37', '150612899', 'google.com', '208.67.220.220'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490026791.33', '150612899', 'google.com', '8.8.4.4'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.57', '2017032001', 'intuit.com', '208.67.220.220'],
['1490026791.47', '2017032001', 'intuit.com', '208.67.222.222'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
remove = [item[1] for item in a if item[3]=='8.8.8.8']
clean = [item for item in a if item[1] not in remove or item[3]=='8.8.8.8']
print clean
Upvotes: 2
Reputation: 140168
I would sort the list to make rows with addresses 8.8.8.8
appear at the start, then I would iterate through the list, marking the key (serial,domain) when inserted to be sure to insert only once.
l = [['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'],
['1490026791.59', '2010113820', 'amazon.com', '208.67.222.222'],
['1490026791.57', '2010113820', 'amazon.com', '8.8.4.4'],
['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'],
['1490026791.37', '150612899', 'google.com', '208.67.220.220'],
['1490026791.35', '150612898', 'google.com', '208.67.222.222'],
['1490026791.33', '150612899', 'google.com', '8.8.4.4'],
['1490019411.19', '150612899', 'google.com', '8.8.8.8'],
['1490026791.57', '2017032001', 'intuit.com', '208.67.220.220'],
['1490026791.47', '2017032001', 'intuit.com', '208.67.222.222'],
['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4'],
['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8']]
inserted = set()
result = []
for row in sorted(l,key=lambda r: r[3]!="8.8.8.8"):
timestamp,serial,domain,server = row
k = (serial,domain)
if k in inserted:
pass # already in result: skip
else:
result.append(row)
inserted.add(k)
results in:
[['1490026791.55', '2010113820', 'amazon.com', '8.8.8.8'], ['1490019411.19', '150612899', 'google.com', '8.8.8.8'], ['1490026791.43', '2017032001', 'intuit.com', '8.8.8.8'], ['1490011456.91', '2010113819', 'amazon.com', '208.67.220.220'], ['1490026791.35', '150612898', 'google.com', '208.67.222.222'], ['1490026791.45', '2017032000', 'intuit.com', '8.8.4.4']]
Upvotes: 1
Reputation: 54223
You didn't write any code, so I won't either.
serial_numbers
Add the serial number to serial_numbers
if ip
is 8.8.8.8
.
Iterate on the list a second time, with a list comprehension.
ip
is 8.8.8.8
or if serial_number
isn't in
serial_numbers
.It will be short to write and fast to run.
Upvotes: 1