How to output a list of all values in a specified column in a grouped object

Question

I have the following problem: I have a grouped object. For each grouped object, I want to make a comma separated list of all the values contained in a specific column of that group. My Code is as follows:

for key, group in df.groupby('Column1'):
    All_values_in_group = []
    for item, frame in group['Column2'].iteritems():
        list = frame.split(',')
        for value in list:
            All_values_in_group.append(value)
            print key
            print All_values_in_group

The thought behind this is that I group my data by a specific column, and make an empty list. Then, for each frame (row) I make a list through splitting the string contained in the row at ','. Each value in this list is then appended to my desired output_list All_values_in_group. This list shall be a 'summary' of all the data contained in Column2 for each row of group X.

My problem now is that I do not get one list, but several list when I print All_values_in_group, like this (L1 is the group key):

L1
['string1]
L1
['string1, 'string2']
L1
['string1', 'string2', 'string3']

I only want one list for All_values_in_group containing all the values from Column2 in that group, much like the last row in the example, and I want to keep duplicates.

To make it clearer, here is an example of my data:

   Column1  Column2 
0     L1    string1,string2,string3
1     L1    string1
2     L1    string2,string3
3     L2    stringA,stringB

What I want is:

L1
All_values_in_group ['string1', 'string2', 'string3', 'string1', 'string2', 'string3']
L2
All_values_in_group ['stringA', 'stringB']

Does anybody know a way to make my code work like this? I have the feeling it's just something small, but I do not come around to it. Thanks in advance!

EdChum · Accepted Answer

You can groupby on 'Column1' and apply a lambda that calls join to concatenate all the string values and then if you desire construct a list object from that result:

In [22]:
df.groupby('Column1')['Column2'].apply(lambda x: [','.join(x)])

Out[22]:
Column1
L1    [string1,string2,string3,string1,string2,string3]
L2                                    [stringA,stringB]
Name: Column2, dtype: object

How to output a list of all values in a specified column in a grouped object

Answers (1)

Related Questions