tarek hassan
tarek hassan

Reputation: 802

Python intersection and difference sum doesn't give me the actual number of the original set

I have two lists, one with old IDs and one with new IDs.

I want to get items in common and items not common.

The new_Items list has all new ones. The old_Items has the old ones.

I suppose that when I calculate the ones in common plus the in new items list but not in old items list, I get the actual number of new items.

Here is the code and the output.

print(old_Items)
print(new_Items)

common          =  set(new_Items) & set(old_Items) 
not_common      =  set(new_Items) - set(old_Items)
print(len(old_Items))
print(len(new_Items))
print(len(common))
print(len(not_common))

output

['312064913440', '312062038159', '382373644951', '312061362147', '312063436815', '382376480677', '382376472268', '382377376960', '382377376948', '312064169607', '312064914150', '312064169620', '312064169613', '382376480674', '382376472280', '382378338388', '312061362154', '312063426996', '382377376961', '312064912982', '312064912973', '312063426974', '312063427017', '312063427025', '312063436813', '312064913415', '382378337435', '382378337746', '382378337752', '382378338374', '382378338378', '382378338385', '382378338387', '382378338389', '382378338392', '312063436814', '312064169626', '312064912968', '312064912971', '312064912972', '312064912981', '312064913414', '312064913435', '312064914151', '312064914158', '382376480665', '382378337434', '382378337437', '382378337449', '382378337456', '382378337737', '382378337757', '312063426962', '382376480681', '382376472292', '382376480675', '382377376955', '312064914146', '382378337735', '312064912964', '312064913436', '312064914160', '382376472265', '382378337443', '382378337738', '382378337740', '312063436819', '382376472311', '382376480678', '382376480667', '312063426963', '312063426969', '312063426988', '312063426991', '312063427011', '312063427027', '312063436817', '312064169618', '312064169622', '312064169623', '312064912959', '312064912966', '312064912974', '312064912975', '312064912976', '312064912979', '312064912980', '312064912985', '312064913416', '312064913417', '312064913420', '312064913424', '312064913427', '312064913437', '312064913439', '312064913442', '312064914148', '312064914155', '312064914162', '312064914163', '312064914164', '312064914166', '382376472307', '382376480658', '382376480679', '382377376950', '382378337438', '382378337442', '382378337444', '382378337445', '382378337446', '382378337448', '382378337455', '382378337458', '382378337460', '382378337739', '382378337742', '382378337745', '382378337748', '382378337749', '382378337750', '382378337756', '382378337758', '382378337759', '382378337765', '382378338361', '382378338363', '382378338372', '382378338373', '382378338377', '382378338379', '382378338382', '382378338383', '382378338384', '312062038160', '312063426970', '312063427014', '312063427022', '312063436820', '312063436821', '312063436822', '312064169625', '312064169630', '312064912962', '312064912963', '312064912969', '312064912978', '312064912983', '312064912984', '312064912986', '312064912987', '312064912988', '312064913419', '312064913425', '312064913432', '312064913438', '312064914147', '312064914154', '312064914159', '312064914161', '382376472276', '382376472282', '382376472297', '382376472308', '382376480659', '382376480663', '382376480670', '382376480673', '382376480676', '382376480684', '382376480686', '382376480687', '382377376951', '382378337433', '382378337436', '382378337439', '382378337447', '382378337450', '382378337451', '382378337452', '382378337454', '382378337457', '382378337736', '382378337741', '382378337743', '382378337747', '382378337751', '382378337754', '382378337760', '382378337761', '382378337763', '382378337764', '382378338362', '382378338365', '382378338366', '382378338367', '382378338368', '382378338369', '382378338370', '382378338371', '382378338381', '382378338386', '382378338390', '312063426985', '312064169612', '382376480671', '312063427019', '312064169608', '312064169610', '312063436828', '312064169619', '382378337755', '312062714117', '312063436833', '312064169611', '382373643627', '382376472281', '382376472287', '382376472301', '382376472302', '382376480661', '382377376952', '382377376954', '382377376956', '382377376957', '382377376959', '382378337459', '312063426973', '312063427005', '312063436826', '312064169606', '312064169624', '312064169628', '382373643615', '382376472288', '382376480666', '382376480669', '382376480682', '312063427002', '312063436831', '312064169614', '312064169615', '382376480662', '382377376947', '312063426998', '382376480664', '382376480668', '382377376958', '312063426992', '312063436810', '312064169605', '312064912970', '312064913418', '312064913429', '312064913431', '382376480660', '382378337753', '382378338364', '382378338380', '312063426964', '312063426957', '312063436809', '312063436812', '382376472298', '382378338393', '382376480680', '312064169629', '312064913423', '312064914152', '312064914157', '312064914165', '382378338375', '382378338376', '312063426977', '312063426978', '382376472279', '312063436827', '382376472275', '382377376949', '312063427001', '312063436825', '312063436829', '312063436830', '312063426989', '312063426993', '312064169609', '382375693533', '382376472267', '382376472299', '382376480685', '312063436832']
['312065926243', '382376472268', '312067111164', '382378338380', '312064913415', '382380706562', '382380706577', '382380706899', '382379331671', '382376480673', '382376480674', '312067111153', '382380706584', '382378337450', '382378337454', '382376472301', '312067111663', '382378337459', '382379835966', '382379835959', '382379835961', '382380706907', '382378337444', '382380706580', '382378337436', '312066454641', '312063426992', '312067111152', '382379335272', '382378337752', '382378337449', '382378337437', '312067111167', '312066454623', '312067111471', '382379835965', '382380706919', '312066454621', '312067111158', '312067111163', '312067111468', '312067111647', '382380706718', '382380706732', '312067111150', '312067111446', '382379331513', '382379835967', '312067111436', '312067111462', '312067111464', '312067111466', '312067111468', '312067111647', '312067111652', '382380706583', '382380706718', '382380706723', '382380706732', '382380706894', '382380706897', '382380706912', '382379331513', '382379835967', '382378337435', '312064912968', '382378337456', '312064912971', '312064912972', '312064914151', '312066454616', '312066454639', '382378338378', '312064912981', '312067111435', '382376472292', '382378337434', '312064912973', '312064914158', '312067111169', '312067111443', '312067111646', '312067111676', '382380706567', '382380706559', '382380706572', '382380706719', '312064914160', '382378337443', '312064914146', '312067111442', '312067111441', '312067111463', '382378337735', '382376472265', '312063436819', '312067111441', '382376472311', '312064914155', '312063427014', '312063436822', '312064912984', '312066454628', '312063436817', '382378337756', '382376480670', '312064912962', '312064913438', '312066454629', '312066454634', '312066454635', '312066454645', '312067111143', '312067111451', '312067111452', '312067111454', '312067111467', '312067111470', '312067111650', '312067111653', '312067111654', '312067111662', '312067111665', '312067111671', '382379835960', '382379835962', '382379835968', '382379835971', '382380706573', '382380706727', '382380706728', '382380706915', '382380706917', '382380706920', '312065919161', '312066454625', '312067111147', '312067111156', '312067111159', '312067111457', '312067111458', '312067111460', '312067111461', '312067111651', '312067111667', '312067111672', '382379835958', '382380706574', '382380706722', '382380706901', '312064913432', '382378337433', '312067111154', '312067111165', '382380706892', '382378338379', '382378338365', '312064912988', '312067111455', '312067111465', '312067111657', '312067111660', '312067111664', '382378337447', '382380706729', '312063436828', '382378338377', '312064913427', '382378337438', '312064913442', '312064912987', '382378337452', '382378338362', '382378337455', '312064912979', '312067111168', '382380706717', '312063427011', '382378337750', '382378337458', '382378337743', '382378338373', '312067111140', '382379835974', '382380706565', '382380706734', '312064912975', '382378337446', '312064914162', '382378338382', '312064914166', '312063426998', '312064914166', '312063426998', '312063427019', '382378337754', '312064912963', '382378338369', '382379835964', '382376472282', '312064914148', '312066454618', '312066454619', '312066454626', '312066454631', '312067111141', '312067111166', '312067111447', '312067111453', '312067111456', '382379835972', '382380706716', '382380706724', '382380706736', '382380706913', '312066454630', '312066454633', '312066454636', '312066454643', '312067111151', '312067111157', '312067111449', '312067111469', '312067111656', '312067111658', '312067111669', '312067111670', '312067111675', '382379835970', '382380706566', '382380706575', '382380706582', '382380706725', '382380706726', '382380706730', '382380706733', '382380706898', '382380706903', '382380706905', '382380706906', '382378338372', '312066454620', '312066454637', '312067111162', '312067111666', '382379835953', '382380706570', '382380706578', '382380706896', '382380706916', '312066454617', '312066454622', '312066454632', '312067111145', '312067111146', '382379835954', '382379835963', '382380706576', '382378337765', '312063426969', '382379835969', '382378337451', '382378338368', '382378337448', '382378337442', '382378338371', '382378337439', '382378338386', '312064912986', '382376472307', '382376480687', '312064912976', '312064912983', '382378337457', '312065916615', '382379835952', '312066454615', '312066454627', '382379835955', '382380706561', '382380706571', '382380706714', '382378338366', '382380706564', '312064912974', '382378337460', '382380706581', '382376480660', '312063427002', '312064912978', '312067111439', '382380706900', '312067111160', '382379835951', '382380706721', '382380706908', '312067111438', '312067111649', '382380706560', '382380706895', '382380706918', '382378337445', '312064912959', '312064912966', '382376480680', '312063436809', '382376472298', '382379835957', '382379835973', '312063427001', '312063426977', '382378338393', '312063426957', '312063436830', '312063436812', '312063436829', '382376472275', '312063436825', '312064913423', '382376472299', '382376472267', '312063436832', '312064914157', '382378338375', '312064914165', '382378338376', '312064914152']
291 # number of items in old_items
327 # number of items in new_items
122 # intersection result
196 # result of newitems set - olditems set

Upvotes: 1

Views: 140

Answers (3)

tarek hassan
tarek hassan

Reputation: 802

The list had some repeated items, that was the problem.

So the set cuts these repeated items and that is why printing less numbers than I expect.

Upvotes: 0

My3
My3

Reputation: 140

A-B gives items of A which are not in B B-A gives items of B which are not in A Either of these is not what you are looking for

For items not common you need (A union B) minus (A intersection B) which is the symmetric difference of sets

You can also get by "(A-B) union (B-A)"

Upvotes: 0

cs95
cs95

Reputation: 402932

What you're looking for is called the "symmetric difference".

set(new_Items) ^ set(old_Items)

Or,

set(new_Items).symmetric_difference(old_Items)

This gives you items that belong to either set, but not both. You are currently computing only those items that belong to new_Items, but not the other way round, hence the discrepancy.

Refer to the set.symmetric_difference docs.

Upvotes: 4

Related Questions