I have a data frame that looks something like this: Category Topic Category1 Topic1 Category2 Topic2 Category1 Topic2 Category3 Topic3 Category2 Topic3 Category3 Topic3 And I want an output like this: Category Topic Frequency Category1 Topic1 Topic2 Topic3 Catgeory2 Topic1 Topic2 Topic3 Category3 Topic1 Topic2 Topic3 I am just starting out with python and I'd really appreciate it if someone could help me out with this.

Reputation: 1

count frequency of items in one column in relation to criteria in another column in python

I have a data frame that looks something like this:

Category	Topic
Category1	Topic1
Category2	Topic2
Category1	Topic2
Category3	Topic3
Category2	Topic3
Category3	Topic3

And I want an output like this:

Category	Topic	Frequency
Category1	Topic1
	Topic2
	Topic3
Catgeory2	Topic1
	Topic2
	Topic3
Category3	Topic1
	Topic2
	Topic3

I am just starting out with python and I'd really appreciate it if someone could help me out with this.

Upvotes: 0

Answers (1)

Pierre D

Reputation: 26321

If the frequency is meant to capture the frequency of topic within each category, then, a basic approch involves:

df.groupby('Category')['Topic'].value_counts(normalize=True)

Which is a Series. For example, on your input data, we get:

Category   Topic 
Category1  Topic1    0.5
           Topic2    0.5
Category2  Topic2    0.5
           Topic3    0.5
Category3  Topic3    1.0
Name: Topic, dtype: float64

For an output organized as per your example, that appears to be a DataFrame with three columns:

out = (
    df
    .groupby('Category')['Topic']
    .value_counts(normalize=True)
    .to_frame('frequency')
    .reset_index()
)

Again, on your input sample:

>>> out
    Category   Topic  frequency
0  Category1  Topic1        0.5
1  Category1  Topic2        0.5
2  Category2  Topic2        0.5
3  Category2  Topic3        0.5
4  Category3  Topic3        1.0

Upvotes: 1

count frequency of items in one column in relation to criteria in another column in python

Answers (1)

Related Questions