Reputation: 321
I have a list of 'words' I want to count below
word_list = ['one','two','three']
And I have a column within pandas dataframe with text below.
TEXT | USER | ID
-------------------------------------------|---------|------
"Perhaps she'll be the one for me." | User 1 | 100
"Is it two or one?" | User 1 | 100
"Mayhaps it be three afterall..." | User 2 | 150
"Three times and it's a charm." | User 2 | 150
"One fish, two fish, red fish, blue fish." | User 2 | 150
"There's only one cat in the hat." | User 3 | 200
"One does not simply code into pandas." | User 3 | 200
"Two nights later..." | User 1 | 100
"Quoth the Raven... nevermore." | User 2 | 150
The desired output that I would like is the following below, where I want to count the number of unique users that has text related to any word in word_list, using the data found in the "TEXT" column. After counting the unique users, I also want to count the sum of the followers related to each tweet, associated with the unique user count of the word.
Word | Unique User Count | ID Sum
one | 3 | 450
two | 2 | 250
three| 1 | 150
Is there a way to do this in Python 2.7?
Upvotes: 0
Views: 218
Reputation: 323306
I break down steps
df.columns=['TEXT','USER','ID']
df[word_list]=df.TEXT.str.lower().apply(lambda x : pd.Series([x.find(y) for y in word_list])).ne(-1)
df1=df[['USER','one','two','three','ID']].set_index(['USER','ID']).astype(int).replace({0:np.nan})
Target=df1.stack().reset_index().groupby('level_2').agg({'USER':lambda x : len(set(x)),'ID':lambda x : sum(set(x))})
Target=Target.reset_index()
Target.columns=['Word','Unique User Count','ID Sum']
Target
Out[97]:
Word Unique User Count ID Sum
0 one 3 450
1 three 1 150
2 two 2 250
Upvotes: 1