Create a Document Frequency Matrix in R

Question

I am attempting to create a document frequency matrix in R.

I currently have a dataframe (df_2), which is made up of 2 columns:

doc_num: which details which document each term is coming from
text_token: which contains each tokenized word relating to each document.

The df's dimensions are 79,447 * 2.

However, there are only 400 actual documents in the 79,447 rows.

I have been trying to create this dfm using the tm package.

I have tried creating a corpus (vectorsource) and then attempting to coerce that into a dfm using the appropriately named "dfm" command.

However, this indicates that "dfm() only works on character, corpus, dfm, tokens objects." I understand my data isn't currently in the correct format for the dfm command to work. My issue is that I don't know how to get from my current point to a matrix as appears below.

Example of what I would like the matrix to look like when complete:

Where 2 is the number of times cat appears in doc_2.

Any help on this would be greatly appreciated.

Is mise le meas.

Create a Document Frequency Matrix in R

Answers (1)

Related Questions