Stefan Christoph
Stefan Christoph

Reputation: 43

How to create a customized trade/law lexicon for r text analysis

I am planning to do text analysis in R just as sentiment analysis with an own custom dictionary following a "trade" versus "law" logic.

I have all the required words for the dictionary in an excel file. Looks like this:

> %  1 Trade 2 Law % business   1 exchange  1 industry  1 rule  2
> settlement    2 umpire    2 court 2 tribunal  2 lawsuit   2 bench 2
> courthouse    2 courtroom 2

What steps do I have to pursue in order to transform this in an R-suitable format and apply it to my text corpus?

Thank you for your help!

Upvotes: 1

Views: 827

Answers (1)

phiver
phiver

Reputation: 23608

Create a data.frame with 2 columns and store this somewhere, either as an rds, a database object or in excel. So you can load it everytime when needed.

Once you have the data in a data.frame you can use joins /dictionaries to match it to the words in your text corpus. In the scoring data.frame I used 1 and 2 to represent the sectors, but you can use words as well.

See example using tidytext, but read up on sentiment analyses and use whatever package you need to.

library(tidytext)
library(dplyr)
text_df <- data.frame(id = 1:2,
                      text = c("The business is in the mining industry and has a settlement.",
                               "The court ordered the business owner to settle the lawsuit."))

text_df %>% 
  unnest_tokens(word, text) %>% 
  inner_join(my_scoring_df)

Joining, by = "word"
  id       word sector
1  1   business      1
2  1   industry      1
3  1 settlement      2
4  2      court      2
5  2   business      1
6  2    lawsuit      2

Data:

my_scoring_df <- structure(list(word = c("business", "exchange", "industry", "rule", 
"settlement", "umpire", "court", "tribunal", "lawsuit", "bench", 
"courthouse", "courtroom"), sector = c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 1

Related Questions