Python program to perform t-test on frequency list

Question

hey so i have the following hurdle in my analysis of data.

I have two frequency lists contained in two seperate text files, that look like this:

list2.txt

325 de
309 het
308 is
289 een
258 ik
208 rt
207 op
192 :
189 van
186 met
178 echt
167 en
160 in
150 dat

list2.txt

528 het
471 ik
466 een
445 de
426 is
350 dat
308 niet
273 van
239 en
227 wat
215 die
199 je
193 met
188 op
180 in
166 te
155 voor

OPTION 1: I am looking for a way, preferably python, to perform the following equation on the following data. This is the formula i am trying to implement:

Pm(w) = relative frequency of word/token 'w' in list1
Pv(w) = relative frequency of word/token 'w' in list2
variance = sqrt (Pm(w) / Nm + Pv(w) / Nv)
t = ( Pm(w) - Pv(w)) / variance

Could somebody help me write a program/function that does this for me. i.e. it takes both text files as input, and produces a t value for each word/token. Im quite lost, and this seems to be taking me forever.

output: new document with t-test values and words.

OPTION2: i am also looking for a way that produces a ratio for me.

Input:(list1.txt and list2.txt)

Output: (list1-ratio.txt)

325  de  445 de  0.7:1
289 een  466 een  0.6:1

Output: (list2-ratio.txt)

445 de  325 de  1.3:1
466 een  289 een 1.6:1

Is there anyone that can help me with this, best case scenario would be to use both options, so i can compare data. This isnt homework, im working on sentiment analysis.

thanx

Python program to perform t-test on frequency list

Answers (1)

Related Questions