Find the number of occurrences of each word in a sentence in each target category

Question

I have got something like this.

Sentence                                        Target
We regret to inform you about the result.        1
We are glad to inform you about the result.      2
We would like to inform you about the result.   3
We are surprised to see the result.              4

I want a word count that looks something like this

Word    Target 1    Target 2    Target 2    Target 4
Result     1           1            1           1
Inform     1           1            1           0
Surprised   0           0           0           1

... and so on. How do I do this?

cs95 · Accepted Answer

You'll need to

remove punctuation and lowercase the data
split on whitespace
stack to create a series
groupby on Target
find the value_counts of words for each target
unstack the result for your desired output

df.Sentence.str.replace('[^\w\s]', '')\
  .str.lower()\
  .str.split(expand=True)\
  .set_index(df.Target)\
  .stack()\
  .groupby(level=0)\
  .value_counts()\
  .unstack(0, fill_value=0)\
  .add_prefix('Target ')


Target     Target 1  Target 2  Target 3  Target 4
about             1         1         1         0
are               0         1         0         1
glad              0         1         0         0
inform            1         1         1         0
like              0         0         1         0
regret            1         0         0         0
result            1         1         1         1
see               0         0         0         1
surprised         0         0         0         1
the               1         1         1         1
to                1         1         1         1
we                1         1         1         1
would             0         0         1         0
you               1         1         1         0

Find the number of occurrences of each word in a sentence in each target category

Answers (1)

Related Questions