user3013706
user3013706

Reputation: 451

python group by two columns, extract first element by one index

If I use groupby function, e.g. Data.groupby(['id','company']).size(), it will give a result like:

id   company 
1    a        2
     b        3
     c        6
2    d        1
     e        5

but how can I extract the numbers [2,1] (the first elements of each zeroth-index-level-group, according to the sorted order of the first-index-level-group)?

Upvotes: 1

Views: 4892

Answers (1)

ely
ely

Reputation: 77404

First, let:

agg_df = Data.groupby(['id','company']).size()

Assuming you want the result from the first entry for each group of elements having the same value for the zeroth level of the MultiIndex, and that each group is sorted by the first index level as you prefer. (After the updated comment, this appears to be the desired output)

unique_zeroth_level = dict(agg_df.index.values).keys()
group_first_vals = [
    agg_df.select(lambda x: x[0]==idx_val, axis=0).head(1).values[0] 
    for idx_val in unique_zeroth_level]

Assuming you're asking for the unique elements of the zeroth level of the resulting MultiIndex

In this particular case, since the returned result is a Series, you can make use of a trick using unstack:

agg_df.unstack(level=0).columns.values

or use a dict constructor

dict(agg_df.index.values).keys()

Assuming you want the result for (1, 'a') and (2, 'd') in particular, and that you want to access them by the index values (not just as a consequence of those being the lexicographically first entries in their respective groups)

agg_df.ix[[(1, 'a'), (2, 'd')]]

Upvotes: 4

Related Questions