Shubham R
Shubham R

Reputation: 7644

find combinations of values in pandas dataframe column

i have a table in pandas df

 id_x             id_y
  a                 b
  b                 c
  c                 d
  d                 a
  b                 a
and so on around (1000 rows)

i want to find the total combinations for each id_x with id_y. something like chaining

ie. a has combinations with a-b,b-c,c-d similarly b has combinations(b-c,c-d,d-a) and also a-b to be considered as a combination for b( a-b = b-a)

and create a dataframe df2 which has

id   combinations  count
a          b,c,d     3
b          c,d,a     3
c          d,a,b     3
d          a,b,c     3
and so on ..(distinct product_id_'s)

and also if i could put each combinations in a different column in dataframe

id   c1  c2   c3...&so on   count
a     b   c   d               3              
b     c   d   a               3

what approach should i follow? my skills on python are at a beginner level. Thanks in advance.

Upvotes: 0

Views: 1461

Answers (1)

Arco Bast
Arco Bast

Reputation: 3892

You could try something like:

#generate dataframe    
pdf = pd.DataFrame(dict(id_x = ['a','b','c','d','b'], id_y = ['b', 'c', 'd', 'a', 'a']))

#generate second dataframe with swapped columns:
pdf_swapped = pdf.rename(columns = dict(id_x= 'id_y', id_y= 'id_x'))

#append both dataframes to each other
pdf_doubled = pd.concat([pdf, dummy_pdf])

#evaluate the frequency of each combination:
result = pdf_doubled.groupby('id_x').apply(lambda x: x.id_y.value_counts())

This gives the following result:

a     b    2
      d    1
b     a    2
      c    1
c     b    1
      d    1
d     c    1
      a    1

To figure out, how frequent the combination a-b is, you can simply do:

result['a', 'b']

Upvotes: 1

Related Questions