user2962397
user2962397

Reputation: 405

How to iterate through row in pandas dataframe to build a dictionary of members and parents?

I'm trying to cycle through each row of a pandas dataframe to build a dictionary of member to parent items.

Each and every value of the dataframe is a member only one time. If a member has no parent, it's parent becomes 'none'.

As an example:

df = pd.DataFrame({'level 5': {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g', 7: 'h', 8: 'i'},
                   'level 4': {0: 'g1', 1: 'g1', 2: 'g1', 3: 'g1', 4: 'g1', 5: 'g1', 6: 'g2', 7: 'g2', 8: 'g3'},
                   'level 3': {0: 'g4', 1: 'g4', 2: 'g4', 3: 'g4', 4: 'g4', 5: 'g4', 6: 'g4', 7: 'g4', 8: 'g6'},
                   'level 2': {0: 'g4', 1: 'g4', 2: 'g4', 3: 'g4', 4: 'g4', 5: 'g4', 6: 'g4', 7: 'g4', 8: 'g4'},
                   'level 1': {0: 'g5', 1: 'g5', 2: 'g5', 3: 'g5', 4: 'g5', 5: 'g5', 6: 'g5', 7: 'g5', 8: 'g5'}})

Which looks like:

  level 5 level 4 level 3 level 2 level 1
0       a      g1      g4      g4      g5
1       b      g1      g4      g4      g5
2       c      g1      g4      g4      g5
3       d      g1      g4      g4      g5
4       e      g1      g4      g4      g5
5       f      g1      g4      g4      g5
6       g      g2      g4      g4      g5
7       h      g2      g4      g4      g5
8       i      g3      g6      g4      g5

Note that all but the last row has two consecutive g4's for level 3 and level 2.

I would like to build a dictionary that looks like this:

output = {'a': 'g1', 'g1': 'g4', 'g4': 'g5', 'g5': 'none', 'b': 'g1', 'c': 'g1', 'd': 'g1', 'e': 'g1', 'f': 'g1', 'g': 'g2', 'g2': 'g4', 'h': 'g2', 'i': 'g3', 'g3': 'g6', 'g6': 'g4'}

I've come close by applying a function to each row of df. But I can't accommodate the ragged hierarchy.

Upvotes: 0

Views: 56

Answers (1)

ansev
ansev

Reputation: 30920

One approach

cols = df.columns
my_dict = {}
for key, value in zip(cols[:-1], cols[1:]):
    my_dict.update(dict(zip(df[key], df[value])))

print(my_dict)

{'a': 'g1',
 'b': 'g1',
 'c': 'g1',
 'd': 'g1',
 'e': 'g1',
 'f': 'g1',
 'g': 'g2',
 'h': 'g2',
 'i': 'g3',
 'g1': 'g4',
 'g2': 'g4',
 'g3': 'g6',
 'g4': 'g5',
 'g6': 'g4'}

if you want 'none' values yo can add at the end:

my_dict.update(dict(zip(df[value], ['none']*len(df)))) 
print(my_dict)


{'a': 'g1', 'b': 'g1', 'c': 'g1', 'd': 'g1', 'e': 'g1',
 'f': 'g1', 'g': 'g2', 'h': 'g2', 'i': 'g3', 'g1': 'g4', 'g2': 'g4',
 'g3': 'g6', 'g4': 'g5', 'g6': 'g4', 'g5': 'none'}

Upvotes: 2

Related Questions