Reputation: 1
I want to add the character vector EU_CFSP_INT_all <- c(...) as metadata to my dfm, so that I can further when performing an stm, set the prevalance to EU_CFSP_INT_all. The character vector includes 62 expressions and my corpus/dfm consists of 201 documents. It might sound trivial, but how do I manage to include EU_CFSP_INT_all as a column in the dfm, in which the 62 expressions are featured on every row (201) of the dfm?
The closest I have gotten was by using the following code:
EU_CFSP_INT_all_EV <- rep_len(EU_CFSP_INT_all, length.out = 201)
dfmat_PRs_trim_c$EUint <- EU_CFSP_INT_all_EV
However, it just looped the singularly the 62 expressions until 201 were reached. Accordingly, only one, instead of all 62 were matched with each document in the dfm.
Also converting the vector to a tokens object got me closer to the goal with the tokens object consisting of 201 documents each with the length of 62:
EU_CFSP_INT_all_vector <- rep(list(EU_CFSP_INT_all), 201)
EU_CFSP_vector_toks <- tokens(EU_CFSP_INT_all_vector)
summary(EU_CFSP_vector_toks)
But when I then continued to create another dfm to merge, the values got scrambled. I feel like there must be a very easy way to do this which I am unaware of. Thanks a lot if anyone can help me out!
Upvotes: 0
Views: 102
Reputation: 14902
If you want to add EU_CFSP_INT_all
to your tokens object as a docvar, it's simple:
docvars(EU_CFSP_vector_toks) <- EU_CFSP_INT_all
These will remain as docvars then in any dfm you create from EU_CFSP_vector_toks
.
Even without that step, however, you could have specified the EU_CFSP_vector_toks
as prevalence
in the call to stm()
, as long as you also supplied it as a data.frame in meta
.
Upvotes: 0