H_W_13
H_W_13

Reputation: 1

How to add character vector as metadata/docvars to a dfm for stm prevalance

I want to add the character vector EU_CFSP_INT_all <- c(...) as metadata to my dfm, so that I can further when performing an stm, set the prevalance to EU_CFSP_INT_all. The character vector includes 62 expressions and my corpus/dfm consists of 201 documents. It might sound trivial, but how do I manage to include EU_CFSP_INT_all as a column in the dfm, in which the 62 expressions are featured on every row (201) of the dfm?

The closest I have gotten was by using the following code:

EU_CFSP_INT_all_EV <- rep_len(EU_CFSP_INT_all, length.out = 201)

dfmat_PRs_trim_c$EUint <- EU_CFSP_INT_all_EV

However, it just looped the singularly the 62 expressions until 201 were reached. Accordingly, only one, instead of all 62 were matched with each document in the dfm.

Also converting the vector to a tokens object got me closer to the goal with the tokens object consisting of 201 documents each with the length of 62:

EU_CFSP_INT_all_vector <- rep(list(EU_CFSP_INT_all), 201)

EU_CFSP_vector_toks <- tokens(EU_CFSP_INT_all_vector)

summary(EU_CFSP_vector_toks)

But when I then continued to create another dfm to merge, the values got scrambled. I feel like there must be a very easy way to do this which I am unaware of. Thanks a lot if anyone can help me out!

Upvotes: 0

Views: 102

Answers (1)

Ken Benoit
Ken Benoit

Reputation: 14902

If you want to add EU_CFSP_INT_all to your tokens object as a docvar, it's simple:

docvars(EU_CFSP_vector_toks) <- EU_CFSP_INT_all

These will remain as docvars then in any dfm you create from EU_CFSP_vector_toks.

Even without that step, however, you could have specified the EU_CFSP_vector_toks as prevalence in the call to stm(), as long as you also supplied it as a data.frame in meta.

Upvotes: 0

Related Questions