Bitstring Similarity Score

Question

I have a CSV file containing data of a questionnaire with 14000 rows. The questionnaire has MCQ-Multiple Response(M10,M13). For MCQ-MR, like in M13 there are 8 choices, if the respondent chooses some choice it is denoted as 1 otherwise it is denoted as 0. I would like to generate a similarity score for each bit string and replace that with bit strings. The score should be calculated in such a way like 00010011 and 00100011 are more similar as the respondent has chosen same choices except for the third and fourth choice so there score must be nearer as compared to 00010011 and 00000001.

M10,M13
1111000100001000,00000001
101010000001000,00000001
111010000001000,00010011
110010000001100,00100011

This thread gives some insight about Levenshtein distance which compares between two strings. But for 14000 rows it will be huge computational burden. Is there any other method to do it?

Bitstring Similarity Score

Answers (1)

Related Questions