Summing a transaction chain in a dataframe, rows linked by column values

Question

I'm trying to link multiple rows from a DataFrame in order to get all possible paths formed by connecting receiver ids to sender ids.

Here is an example of my DataFrame:

   transaction_id sender_id receiver_id  amount
0          213234       002         125      10
1          223322       017         354      90
2          343443       125         689      70
3          324433       689         233       5
4          328909       354         456      10

created with:

df = pd.DataFrame(
    {'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
     'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
     'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
     'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)

Result of my code should be list of chained ids and the total amount for the transaction chain. For the first two rows in the above example, that's something like:

[('002', '125', '689', '233'), 85]
[('017', '354', '456'), 100]

I already tried to iterate through the rows and convert each row to an instance of a Node class, and then used methods for traversing a linked list, but I have no idea what the next step is here:

class Node:
    def __init__(self,transaction_id,sender,receiver,amount):
        self.transac = transaction_id
        self.val = sender_id
        self.next = receiver_id
        self.amount = amount
    def traverse(self):
        node = self # start from the head node
        while node != None:
            print (node.val) # access the node value
            node = node.next # move on to the next node

for index, row in customerTransactionSqlDf3.iterrows():
    index = Node( 
        row["transaction_id"],
        row["sender_id"],
        row["receiver_id"],
        row["amount"]
    )

Additional information:

The sender_id values are unique, for each sender id there is only one possible transaction chain.
There are no cycles, there is never a chain where the receiver id points back to a sender id in the same path.

dee cue · Accepted Answer

I have no idea what the next step is here

By using your current implementation, you can connect the two Node objects by iterating each nodes. You can also add visited property in the Node class so that you can identify unique chain as you traverse through the tree i.e. there is not one chain that is a sub-chain of another chain. However, if you want to know the chain for each sender_id, this may be not necessary.

Edit: I noticed that you mentioned the example of the expected result is for the first two rows. This implies that each sender_id should have their own chain. Modifying the traverse method so that it can be used after the nodes are all connected.

Edit: Reimplementing visited property to get unique chain

df = pd.DataFrame(
    {'transaction_id': {0: '213234', 1: '223322', 2: '343443', 3: '324433', 4: '328909'},
     'sender_id': {0: '002', 1: '017', 2: '125', 3: '689', 4: '354'},
     'receiver_id': {0: '125', 1: '354', 2: '689', 3: '233', 4: '456'},
     'amount': {0: 10, 1: 90, 2: 70, 3: 5, 4: 10}}
)

class Node:
    def __init__(self,transaction_id,sender_id,receiver_id,amount):
        self.transac = transaction_id
        self.sender = sender_id
        self.receiver = receiver_id
        self.next = None
        self.amount = amount
        self.visited = False
    def traverse(self, chain=None, total=0):
        if (self.visited): # undo visited nodes
            return
        self.visited = True
        if chain is None: # this is the beginning of the traversal
            chain = [self.sender]
        chain += [self.receiver]
        total += self.amount
        if self.next is not None:
            return self.next.traverse(chain, total)
        return chain, total

transc = [Node( 
        row["transaction_id"],
        row["sender_id"],
        row["receiver_id"],
        row["amount"]
    ) for i, row in df.iterrows()]

# connect the nodes
for i, v in enumerate(transc):
    for j, k in enumerate(transc):
        # if the receiver v same as the sender from j
        if v.receiver == k.sender:
            v.next = k


summary = [i.traverse() for i in transc]
summary = [i for i in summary if i is not None] # removing None

print(summary)

The output:

[
    (['002', '125', '689', '233'], 85), 
    (['017', '354', '456'], 100)
]

Summing a transaction chain in a dataframe, rows linked by column values

Answers (2)

Related Questions