Reputation: 61
In short, I'm trying to translate a DataFrame like this
Patient Cough Headache Dizzy
1 1 0 0
2 1 1 1
3 0 1 0
4 1 0 1
5 0 1 0
into a frequency distribution matrix similar to Pandas correlation feature.
That is to say, it would return something like this
Cough Headache Dizzy
Cough 1 0.33 0.66
Headache 0.33 1 0.33
Dizzy 1 0.5 1
because 1 in 3 people with Headache were Dizzy, but only 1 in 2 people who were Dizzy had a Headache, etc.
The actual data I want to use it on is a lot bigger, so I was just curious if Pandas has a way to do this automatically.
Upvotes: 1
Views: 294
Reputation: 150785
Something like this?
# extract columns of interest
s = df.iloc[:,1:]
# output
((s.T @ s)/s.sum()).T
Output:
Cough Headache Dizzy
Cough 1.000000 0.333333 0.666667
Headache 0.333333 1.000000 0.333333
Dizzy 1.000000 0.500000 1.000000
Upvotes: 2