EFaden
EFaden

Reputation: 57

Pandas GroupBy Concatenation Issues

I have a data table that contains NAME, DATE, TYPE, VALUE A, VALUE B, VALUE C, ....

Theoretically the NAME, DATE, TYPE are supposed to be unique, but it turns out they are not, so I need to figure out how to handle it.

Example Input:

Bob, 1/1/18, AType, blah, test, test2
Bob, 1/1/18, AType, blah2, test, test2
Bob, 1/1/18, BType, blah, test, test2

Ideal Output

Bob, 1/1/18, AType, [blah,blah2], test, test2
Bob, 1/1/18, BType, blah, test, test2

My goal is to merge the VALUE A values into a JSON style array, and then take the last value for VALUE B and VALUE C.

I have tried:

data.groupby('NAME', 'DATE', 'TYPE').apply(lambda x: ','.join(set(x)))

But that merges ALL of the fields.... Or seems to.

Any help?

Upvotes: 1

Views: 48

Answers (1)

jpp
jpp

Reputation: 164783

This seems to work:

import pandas as pd

df = pd.DataFrame([['Bob', '1/1/18', 'AType', 'blah', 'test', 'test2'],
                   ['Bob', '1/1/18', 'AType', 'blah2', 'test', 'test3'],
                   ['Bob', '1/1/18', 'BType', 'blah', 'test', 'test2']],
                  columns=['NAME', 'DATE', 'TYPE', 'VALUE A', 'VALUE B', 'VALUE C'])

f = {'VALUE A': lambda x: set(x), 'VALUE B': 'last', 'VALUE C': 'last'}

df = df.groupby(['NAME', 'DATE', 'TYPE'])['VALUE A', 'VALUE B', 'VALUE C']\
       .agg(f).reset_index()
df['VALUE A'] = df['VALUE A'].map(list)

#   NAME    DATE   TYPE        VALUE A VALUE B VALUE C
# 0  Bob  1/1/18  AType  [blah2, blah]    test   test3
# 1  Bob  1/1/18  BType         [blah]    test   test2

Upvotes: 1

Related Questions