Reputation: 23
i've a csv file of 2 columns like that : (an animal can have many class but if it present several times in the same class we must count it only one time).
animal_id class
65HddD class-a
65HddD class-a
65HddD class-b
65HddD class-c
7DHSKQ class-d
7DHSKQ class-e
7DHSKQ class-f
9JSJSZ class-g
ZDKSKS class-h
9SJSSF class-i
ZSKSSS class-j
9SJSSZ class-k
... etc
i search to obtain a dataframe like that (where the values it's the number of unique animals common at two class) :
for example the value 20 below it's the number of unique animals present both into class-a and class-b. and the value 40 the number of unique animal present only in class-a (but we don't must count more one time an animal present 2 times in same class)
class-a class-b class-c class-d class-n
class-a 40 20 VALUE
class-b
class-c
class-d
class-n
i've tried that but without success :
import pandas as pd
animals_groups = pd.read_csv("animals.csv", sep="\s+")
animals_groups.pivot_table(values="animal_id", index="class", columns="class", aggfunc="nunique")
Upvotes: 2
Views: 38
Reputation: 862661
Use DataFrame.merge
with DataFrame.pivot_table
:
df = df.merge(df, on='animal_id')
df = df.pivot_table(values="animal_id",
index="class_x",
columns="class_y",
aggfunc="nunique",
fill_value=0)
print (df)
class_y class-a class-b class-c class-d class-e class-f class-g \
class_x
class-a 1 1 1 0 0 0 0
class-b 1 1 1 0 0 0 0
class-c 1 1 1 0 0 0 0
class-d 0 0 0 1 1 1 0
class-e 0 0 0 1 1 1 0
class-f 0 0 0 1 1 1 0
class-g 0 0 0 0 0 0 1
class-h 0 0 0 0 0 0 0
class-i 0 0 0 0 0 0 0
class-j 0 0 0 0 0 0 0
class-k 0 0 0 0 0 0 0
class_y class-h class-i class-j class-k
class_x
class-a 0 0 0 0
class-b 0 0 0 0
class-c 0 0 0 0
class-d 0 0 0 0
class-e 0 0 0 0
class-f 0 0 0 0
class-g 0 0 0 0
class-h 1 0 0 0
class-i 0 1 0 0
class-j 0 0 1 0
class-k 0 0 0 1
Upvotes: 1