Reputation: 3
I have a feature co-occurrence matrix of 8,347 by 8,347 with tri = FALSE. I would like to be able to select a feature individually so that I can see what terms frequently co-occur with it. Seemingly this would entail selecting the column for the feature and sorting the associated rows in descending order.
fcm_select
doesn't work, because it isolates the term in both the column and the row:
>SELECT_FROM_FCM = fcm_select(
MY_FCM,
pattern = c("FEATURE"),
selection = c("keep"),
valuetype = c("glob"),
case_insensitive = TRUE
)
>View(SELECT_FROM_FCM)
--------------------
| | FEATURE |
--------------------
| FEATURE | 667 |
--------------------
dfm_subset
also doesn't seem to work. Am I going about this the wrong way?
Upvotes: 0
Views: 409
Reputation: 14902
You can form the fcm and then select it using normal matrix indexing operations. In this example, I formed a document-context feature co-occurrence matrix from the last 10 inaugural addresses, and search for the features that co-occur with the features "war" and "terror".
library("quanteda")
## Package version: 2.0.1
fcmat <- data_corpus_inaugural %>%
tail(10) %>%
tokens(remove_punct = TRUE) %>%
fcm()
# select a specific feature
fcmat[, c("war", "terror")]
## Feature co-occurrence matrix of: 3,467 by 2 features.
## features
## features war terror
## Senator 10 2
## Hatfield 1 1
## Mr 18 3
## Chief 7 1
## Justice 7 1
## President 32 8
## Vice 9 2
## Bush 4 2
## Mondale 1 1
## Baker 1 1
## [ reached max_feat ... 3,457 more features ]
In the forthcoming 2.1.0 release (available on GitHub only as of 5 June 2020), you can use char_select()
to get pattern matching on the features, e.g.:
# only in forthcoming 2.1.0 (currently on GitHub)
fcmat[, char_select(featnames(fcmat), "terror*")]
## Feature co-occurrence matrix of: 3,467 by 2 features.
## features
## features terror terrorism
## Senator 2 2
## Hatfield 1 1
## Mr 3 3
## Chief 1 2
## Justice 1 2
## President 8 10
## Vice 2 2
## Bush 2 2
## Mondale 1 1
## Baker 1 1
## [ reached max_feat ... 3,457 more features ]
Finally, these fcm results are easily converted into a data.frame or regular matrix for output and use in other systems, if that is what you ultimately need.
Upvotes: 0