Reputation: 7644
i have a table in pandas df
id_x id_y
a b
b c
c d
d a
b a
and so on around (1000 rows)
i want to find the total combinations for each id_x with id_y. something like chaining
ie. a has combinations with a-b,b-c,c-d
similarly b has combinations(b-c,c-d,d-a) and also a-b to be considered as a combination for b( a-b = b-a)
and create a dataframe df2 which has
id combinations count
a b,c,d 3
b c,d,a 3
c d,a,b 3
d a,b,c 3
and so on ..(distinct product_id_'s)
and also if i could put each combinations in a different column in dataframe
id c1 c2 c3...&so on count
a b c d 3
b c d a 3
what approach should i follow? my skills on python are at a beginner level. Thanks in advance.
Upvotes: 0
Views: 1461
Reputation: 3892
You could try something like:
#generate dataframe
pdf = pd.DataFrame(dict(id_x = ['a','b','c','d','b'], id_y = ['b', 'c', 'd', 'a', 'a']))
#generate second dataframe with swapped columns:
pdf_swapped = pdf.rename(columns = dict(id_x= 'id_y', id_y= 'id_x'))
#append both dataframes to each other
pdf_doubled = pd.concat([pdf, dummy_pdf])
#evaluate the frequency of each combination:
result = pdf_doubled.groupby('id_x').apply(lambda x: x.id_y.value_counts())
This gives the following result:
a b 2
d 1
b a 2
c 1
c b 1
d 1
d c 1
a 1
To figure out, how frequent the combination a-b is, you can simply do:
result['a', 'b']
Upvotes: 1