Reputation: 35
I have a pandas Dataframe and I want to merge multiple list of tuples in different rows into one list of tuples. The dataset has 10 000+ rows and I want to add all of the list of tuples into one list of tuples.
InvoiceNo Description
534 [(AB, AC), (ACBO, PPK)]
415 [(AD, AT), (CBO, PKD), (CBO, PKA)]
315 [(FDC, ATO), (VBO, IKD), (CVB, PKD)]
Desired output:
Edges = [(AB, AC), (ACBO, PPK), (AD, AT), (CBO, PKD), (CBO, PKA), (FDC, ATO), (VBO, IKD), (CVB, PKD)]
Upvotes: 1
Views: 102
Reputation: 4638
for pandas version 1+ you can also use the explode method:
df['Description'].explode().tolist()
output:
[('AB', 'AC'), ('ACBO', 'PPK'), ('AD', 'AT'), ('CBO', 'PKD'), ('FDC', 'ATO'), ('VBO', 'IKD'), ('CVB', 'PKD')]
Upvotes: 2
Reputation: 156
With the number of rows, does duplicate edges cause problems for you application?
If it does, consider the sets type instead of the list. Then you can use jezrael's beautiful comprehension one liner with {}:
Edges = {y for x in df.Description for y in x}
Upvotes: 0
Reputation: 863801
Use list comprehension with flatten nested lists of tuples:
Edges = [y for x in df.Description for y in x]
print (Edges)
[('AB', 'AC'), ('ACBO', 'PPK'), ('AD', 'AT'), ('CBO', 'PKD'),
('CBO', 'PKA'), ('FDC', 'ATO'), ('VBO', 'IKD'), ('CVB', 'PKD')]
Or chain.from_iterable
for better performance:
from itertools import chain
Edges = list(chain.from_iterable(df.Description))
print (Edges)
[('AB', 'AC'), ('ACBO', 'PPK'), ('AD', 'AT'), ('CBO', 'PKD'),
('CBO', 'PKA'), ('FDC', 'ATO'), ('VBO', 'IKD'), ('CVB', 'PKD')]
Upvotes: 6