Reputation: 29
I have pandas dataframe that looks like this:
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
name cards
0 ['A', 'B', 'C', 'D']
1 ['B', 'C', 'D', 'E']
2 ['E', 'F', 'G', 'H']
3 ['A', 'A', 'E', 'F']
And I'd like to create a matrix that looks like this:
name 0 1 2 3
name
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 4
Where the values are the number of items in common.
Any ideas?
Upvotes: 2
Views: 137
Reputation: 1335
By list comprehension and iterate through all pairs we can make the result:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(list(set(x) & set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output :
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 3]]
'&' is used to calculate intersection of two sets
This is exactly what you want:
import pandas as pd
df = pd.DataFrame({'name': [0, 1, 2, 3], 'cards': [['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']]})
result=[[len(x)-max(len(set(y) - set(x)),len(set(x) - set(y))) for x in df['cards']] for y in df['cards']]
print(result)
output:
[[4, 3, 0, 1], [3, 4, 1, 1], [0, 1, 4, 2], [1, 1, 2, 4]]
Upvotes: 1
Reputation: 475
import pandas as pd
import numpy as np
df = pd.DataFrame([['A', 'B', 'C', 'D'],
['B', 'C', 'D', 'E'],
['E', 'F', 'G', 'H'],
['A', 'A', 'E', 'F']])
nrows = df.shape[0]
# Initialization
matrix = np.zeros((nrows,nrows),dtype= np.int64)
for i in range(0,nrows):
for j in range(0,nrows):
matrix[i,j] = sum(df.iloc[:,i] == df.iloc[:,j])
print(matrix)
[[4 1 0 0]
[1 4 0 0]
[0 0 4 0]
[0 0 0 4]]
Upvotes: 0
Reputation: 1475
Using .apply
method and lambda
we can directly get a dataframe
def func(df, j):
return pd.Series([len(set(i)&set(j)) for i in df.cards])
newdf = df.cards.apply(lambda x: func(df, x))
newdf
0 1 2 3
0 4 3 0 1
1 3 4 1 1
2 0 1 4 2
3 1 1 2 3
Upvotes: 1