bmcisme
bmcisme

Reputation: 15

Grouping the counts of two pandas columns into a dataframe

I have the following data frame:

Sport AgeGroup
Baseball 20s
Football 20s
Baseball 30s
Baseball 20s
Football 20s
Football 20s
Football 30s

And the goal here is to get the counts between the two columns such that it gets the counts of each column into a list of lists in the following format: [['baseball', '20s', 2], ['baseball', '30s', 1], ['football', '20s', 3], ['football', '30s', 1]] where each list consists of the format [sport, ageGroup, count where both exist in the table].

Thus far, I've gotten what I needed in a way by doing: sport_age_count = df.groupby(["sport", "ageGroup"]).size(). The problem here though is that the result isn't indexable and only treats the count like a singular list. I get the following result with the simple code I have above:

Sport AgeGroup
Baseball 20s 2
30s 1
Football 20s 3
30s 1

The problem is, when I try to use baseball, football, 20s, or 30s, it won't let me. I can only access the counts. Also, besides the fact, I'd really like this to be formatted into the bold list of lists like above as well. I wasn't sure if there's any alternative python tricks out there besides using extensive for loops to build this list myself.

Upvotes: 1

Views: 51

Answers (2)

Andrej Kesely
Andrej Kesely

Reputation: 195458

Another solution, using .agg():

print(
    df.groupby(["Sport", "AgeGroup"], as_index=False)
    .size()
    .agg(list, axis=1)
    .tolist()
)

Prints:

[['Baseball', '20s', 2], ['Baseball', '30s', 1], ['Football', '20s', 3], ['Football', '30s', 1]]

Upvotes: 2

G. Anderson
G. Anderson

Reputation: 5955

If you want an array (makes for easier indexing) you can reset the index and use to_numpy() after your groupby

sport_age_count.reset_index().to_numpy()

array([['Baseball', '20s', 2],
       ['Baseball', '30s', 1],
       ['Football', '20s', 3],
       ['Football', '30s', 1]], dtype=object)

But if you want a list of lists, you can use numpy's tolist()

sport_age_count.reset_index().to_numpy().tolist()

[['Baseball', '20s', 2],
 ['Baseball', '30s', 1],
 ['Football', '20s', 3],
 ['Football', '30s', 1]]

Upvotes: 1

Related Questions