Muhammad Nauman
Muhammad Nauman

Reputation: 1

count frequency of items in one column in relation to criteria in another column in python

I have a data frame that looks something like this:

Category Topic
Category1 Topic1
Category2 Topic2
Category1 Topic2
Category3 Topic3
Category2 Topic3
Category3 Topic3

And I want an output like this:

Category Topic Frequency
Category1 Topic1
Topic2
Topic3
Catgeory2 Topic1
Topic2
Topic3
Category3 Topic1
Topic2
Topic3

I am just starting out with python and I'd really appreciate it if someone could help me out with this.

Upvotes: 0

Views: 124

Answers (1)

Pierre D
Pierre D

Reputation: 26221

If the frequency is meant to capture the frequency of topic within each category, then, a basic approch involves:

df.groupby('Category')['Topic'].value_counts(normalize=True)

Which is a Series. For example, on your input data, we get:

Category   Topic 
Category1  Topic1    0.5
           Topic2    0.5
Category2  Topic2    0.5
           Topic3    0.5
Category3  Topic3    1.0
Name: Topic, dtype: float64

For an output organized as per your example, that appears to be a DataFrame with three columns:

out = (
    df
    .groupby('Category')['Topic']
    .value_counts(normalize=True)
    .to_frame('frequency')
    .reset_index()
)

Again, on your input sample:

>>> out
    Category   Topic  frequency
0  Category1  Topic1        0.5
1  Category1  Topic2        0.5
2  Category2  Topic2        0.5
3  Category2  Topic3        0.5
4  Category3  Topic3        1.0

Upvotes: 1

Related Questions