Reputation: 119
I want to see the words included in a dictionary. Here is my dictionary:
Name Type Value
dict_lg list [2] (quanteda::dictionary2) List of length 2
NEGATIVE character [2867] 'à côrnes' 'à court de personnel'
POSITIVE list [1] (quanteda::dictionary2) List of length 1
VÉRITÉ* (1)) character [0]
I would like to see the words included in each list (NEGATIVE, POSITIVE). If I do:
dict_lg <- dictionary(file = "frlsd/frlsd.cat", encoding = "UTF-8")
dict_lg$NEGATIVE
it prints me the list of negative words, but if I do:
dict_lg$POSITIVE
I obtain:
Dictionary object with 1 key entry.
- [VÉRITÉ* (1))]:
or if I do
dict_lg[["POSITIVE"]][["VÉRITÉ* (1))"]]
I obtain
character(0)
How can I see the list of positive words? The original dictionary file is this one: https://www.poltext.org/fr/donnees-et-analyses/lexicoder
Upvotes: 0
Views: 106
Reputation: 14902
The problem here lies with the file you referenced at https://www.poltext.org/fr/donnees-et-analyses/lexicoder. For the value "VÉRITÉ" under the key "POSITIVE", it has an extra ")". Eliminate that, and the dictionary will behave properly.
I eliminated the extra ")" and then loaded in the edited file, and it works fine.
library("quanteda")
#> Package version: 3.3.1
#> Unicode version: 14.0
#> ICU version: 71.1
#> Parallel computing: 10 of 10 threads used.
#> See https://quanteda.io for tutorials and examples.
dict <- dictionary(file = "~/Downloads/frlsd_edited.cat")
print(dict, max_nval = 6)
#> Dictionary object with 2 key entries.
#> - [NEGATIVE]:
#> - à côrnes, à court de personnel , à l'étroit, à peine*, abais*, abandon* [ ... and 2,861 more ]
#> - [POSITIVE]:
#> - à l'épreuve*, à la mode, abondamment, abondance, abondant*, abonde* [ ... and 1,278 more ]
head(dict$POSITIVE)
#> [1] "à l'épreuve*" "à la mode" "abondamment" "abondance" "abondant*"
#> [6] "abonde*"
head(dict$NEGATIVE)
#> [1] "à côrnes" "à court de personnel " "à l'étroit"
#> [4] "à peine*" "abais*" "abandon*"
Created on 2023-07-24 with reprex v2.0.2
Upvotes: 1
Reputation: 6921
You can examine the list structure of the dictionary like so:
rapply(dict_lg, f = \(i) i, how = 'list') |> str()
... which suggests that the structure was messed up (either at generation of the cat-file or upon import):
List of 2
$ NEGATIVE:List of 1
..$ : chr [1:2867] "à côrnes" "à court de personnel " "à l'étroit" "à peine*" ...
$ POSITIVE:List of 2
..$ VÉRITÉ* (1)):List of 1
.. ..$ : chr(0)
..$ : chr [1:1283] "à l'épreuve*" "à la mode" "abondamment" "abondance" ...
... however you can pull all terms from list item 'POSITIVE' like this:
rapply(dict_lg, f = \(i) i, how = 'list')$POSITIVE
edit to convert the dictionary into a dataframe of terms and sentiments to, e. g. filter out the terms of negative sentiment:
library(dplyr)
rapply(dict_lg, f = \(i) i, how = 'unlist', ) %>%
data.frame(term = .,
sentiment = gsub('(POSITIVE|NEGATIVE).*', '\\1', names(.))
) %>%
filter(sentiment == 'NEGATIVE')
term sentiment
NEGATIVE1 à côrnes NEGATIVE
NEGATIVE2 à court de personnel NEGATIVE
NEGATIVE3 à l'étroit NEGATIVE
NEGATIVE4 à peine* NEGATIVE
NEGATIVE5 abais* NEGATIVE
NEGATIVE6 abandon* NEGATIVE
## truncated
Upvotes: 2