Travis G
Travis G

Reputation: 1602

Multi level Dataframe Grouping

I am trying to do a Multi level Grouping with end result in List and Dictionary building. So for example if my Dataframe looks like this blow

 X      Y   Z
AAAA BBBB CCCCC
AAAA BBBB DDDDD
AAAA BBBB EEEEE
FFFF GGGG HHHHH
FFFF GGGG IIIII
JJJJ KKKK LLLLL

I am trying to merge Y and Z Columns into Dictionary based on grouping from X and Y. hence the end result be like

P       Q
AAAA    BBBB:[{CCCCC},{DDDDD},{EEEEE}]
FFFF    GGGG:[{HHHHH},{IIIII}]
JJJJ    KKKK:[{LLLLL}]

Is this something that is purely achievable by pandas Functions or do i have to loop and then then process ?

Upvotes: 2

Views: 187

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can do it in 2 steps:

  1. Create list of sets by lambda function within SeriesGroupBy.agg() of Series Z grouped by X and Y.
  2. Create dictionary with Y as key and list of sets created in step 1 as value by using df.apply() with pd.Series()

Codes:

df2 = (df.groupby(['X', 'Y'], as_index=False)['Z']
         .agg(lambda x: [{y} for y in x.values]))
df2 = df2.apply(lambda x: pd.Series({'P': x['X'], 'Q': {x['Y']: x['Z']}}), axis=1)

Result:

      P                                            Q
0  AAAA  {'BBBB': [{'CCCCC'}, {'DDDDD'}, {'EEEEE'}]}
1  FFFF             {'GGGG': [{'HHHHH'}, {'IIIII'}]}
2  JJJJ                        {'KKKK': [{'LLLLL'}]}

Alternatively, if you want to keep all columns in one place, you can also modify the 2nd step to use result_type='expand' instead of using pd.Series(), as follows:

df2 = (df.groupby(['X', 'Y'], as_index=False)['Z']
         .agg(lambda x: [{y} for y in x.values]))
df2[['P', 'Q']] = df2.apply(lambda x: [x['X'], {x['Y']: x['Z']}], result_type='expand', axis=1)

Result:

      X     Y                            Z     P                                            Q
0  AAAA  BBBB  [{CCCCC}, {DDDDD}, {EEEEE}]  AAAA  {'BBBB': [{'CCCCC'}, {'DDDDD'}, {'EEEEE'}]}
1  FFFF  GGGG           [{HHHHH}, {IIIII}]  FFFF             {'GGGG': [{'HHHHH'}, {'IIIII'}]}
2  JJJJ  KKKK                    [{LLLLL}]  JJJJ                        {'KKKK': [{'LLLLL'}]}

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195543

You can group by X and Y column and the aggregate to list. Then create the dictionaries with .apply:

df = df.groupby(["X", "Y"], as_index=False).agg(list).rename(columns={"X": "P"})
df["Q"] = df[["Y", "Z"]].apply(lambda x: {x["Y"]: x["Z"]}, axis=1)
print(df[["P", "Q"]])

Prints:

      P                                      Q
0  AAAA  {'BBBB': ['CCCCC', 'DDDDD', 'EEEEE']}
1  FFFF           {'GGGG': ['HHHHH', 'IIIII']}
2  JJJJ                    {'KKKK': ['LLLLL']}

Upvotes: 1

Related Questions