Reputation: 249
I have dataframe where I need to find the top 20 repeated sentence using Python, Please let me know how to go about it
Column A
Hello How are you?
This ticket is not valid
How are things at you end?
Hello How are you?
How can I help you?
Please help me with tickets
This ticket is not valid
Hello How are you?
Expected Output
Column A Frequency of Repeated sentence
Hello How are you? 3
This ticket is not valid 2
How can I help you? 1
.
.
.
Code so far
df = pd.read_csv("C:\\Users\\aaa\\abc\\Analysis\\chat.csv", encoding="ISO-8859-1")
df['word_count'] = df['Column A'].apply(lambda x: len(str(x).split(" ")))
df[['Column A','word_count']].head()
for i, g in df.groupby('Column A'):
print ('Frequency of repeating sentence : {}'.format(g['Column A'].duplicated(keep=False).sum()))
I need the result in a dataframe which can be written to CSV with "Column A" and "Frequency" columns in the final result
Upvotes: 3
Views: 194
Reputation: 2819
Try this:
df['count']=df.groupby(['ColumnA'] ).count()
df.sort_values(by='count', ascending=False)
print(df.head(20))
Upvotes: 2
Reputation: 879
Try this
freq_series= df.groupby(['Column A']).size()
new_df=pd.DataFrame({'ColumnA':freq_series.index,'frequency':freq_series.values})
new_df.to_csv('<your csv name>.csv')
Upvotes: 0
Reputation: 1157
df['count'] = df.groupby('Sentence')['Sentence'].transform('count')
df = df.sort_values(by = 'count', ascending = False)
df.head(20)
This will add a column 'count' to the original dataframe, which will contain the frequency of the corresponding sentence. transform()
returns a Series that is aligned with the original dataframe.
Upvotes: 1
Reputation: 21749
Here's a way using .value_counts
:
df['ColumnA'].value_counts()
To add it as a column, you can do:
df['Frequency'] = df['ColumnA'].map(df['ColumnA'].value_counts())
Upvotes: 4