y2p
y2p

Reputation: 4941

Correlation in text using R

My data looks like (example)

ID     Col1     Col2
1232   ABCSD    abd
2342   ABCSD    esw
7643   ABCSD    rty
9821   ETHS     fvc

I have 2845428 such rows. I want to find out how correlated each pair in Col1 and Col2 is. For example

ABCSD     abd     0.64
ETHS      fvc     0.23

How can I go about it using R? Thanks

Upvotes: 1

Views: 992

Answers (1)

Jason B
Jason B

Reputation: 893

I assume that by correlation you mean something like "what portion of the ABCSD observations have abd in Col2..."

If your data are in a dataframe named df,

#get the absolute frequency
freqs <- ftable(df[,2:3])  

#convert to relative frequency
freqs <- freqs/rowSums(freqs)

#then to get the format you want
library(reshape)
freqs <- melt(freqs)

Upvotes: 1

Related Questions