BeeGee
BeeGee

Reputation: 875

Conditionally find percent of occurrences in field 2, given field 1 in DataFrame

I am running Windows 10, Python 2.7 through the Spyder IDE.

I have a pandas DataFrame called df:

df = pd.DataFrame({'fld1': ['x', 'x', 'x','x','y','y','y','z','z']
                , 'fld2': ['a', 'b', 'c','c','a','b','c','a','b']})

>>> df
fld1 fld2
0    x    a
1    x    b
2    x    c
3    x    c
4    y    a
5    y    b
6    y    c
7    z    a
8    z    b

I would like to calculate the percent of fld2 that make up fld1 and save that percentage in fld3 so that the product is unique combinations of fld1 and fld2. The product of this code should look like df2:

df2 = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
                , 'fld2': ['a', 'b', 'c','a','b','c','a','b']
                , 'fld3': [.25,.25,.50,.33,.33,.33,.5,.5]})
>>> df2
fld1 fld2  fld3
0    x    a  0.25
1    x    b  0.25
2    x    c  0.50
3    y    a  0.33
4    y    b  0.33
5    y    c  0.33
6    z    a  0.50
7    z    b  0.50

Upvotes: 2

Views: 68

Answers (1)

jezrael
jezrael

Reputation: 862771

You can use groupby, size and divide by sums created bytransform:

print df

  fld1 fld2
0    x    a
1    x    b
2    x    c
3    x    c
4    y    a
5    y    b
6    y    c
7    z    a
8    z    b
g = df.groupby(['fld1', 'fld2'])['fld1'].size()
print g

fld1  fld2
x     a       1
      b       1
      c       2
y     a       1
      b       1
      c       1
z     a       1
      b       1
dtype: int64

print g / g.groupby(level=0).transform(sum)

fld1  fld2
x     a       0.250000
      b       0.250000
      c       0.500000
y     a       0.333333
      b       0.333333
      c       0.333333
z     a       0.500000
      b       0.500000
dtype: float64

Upvotes: 2

Related Questions