Conditionally find percent of occurrences in field 2, given field 1 in DataFrame

Question

I am running Windows 10, Python 2.7 through the Spyder IDE.

I have a pandas DataFrame called df:

df = pd.DataFrame({'fld1': ['x', 'x', 'x','x','y','y','y','z','z']
                , 'fld2': ['a', 'b', 'c','c','a','b','c','a','b']})

>>> df
fld1 fld2
0    x    a
1    x    b
2    x    c
3    x    c
4    y    a
5    y    b
6    y    c
7    z    a
8    z    b

I would like to calculate the percent of fld2 that make up fld1 and save that percentage in fld3 so that the product is unique combinations of fld1 and fld2. The product of this code should look like df2:

df2 = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
                , 'fld2': ['a', 'b', 'c','a','b','c','a','b']
                , 'fld3': [.25,.25,.50,.33,.33,.33,.5,.5]})
>>> df2
fld1 fld2  fld3
0    x    a  0.25
1    x    b  0.25
2    x    c  0.50
3    y    a  0.33
4    y    b  0.33
5    y    c  0.33
6    z    a  0.50
7    z    b  0.50

jezrael · Accepted Answer

You can use groupby, size and divide by sums created bytransform:

print df

  fld1 fld2
0    x    a
1    x    b
2    x    c
3    x    c
4    y    a
5    y    b
6    y    c
7    z    a
8    z    b

g = df.groupby(['fld1', 'fld2'])['fld1'].size()
print g

fld1  fld2
x     a       1
      b       1
      c       2
y     a       1
      b       1
      c       1
z     a       1
      b       1
dtype: int64

print g / g.groupby(level=0).transform(sum)

fld1  fld2
x     a       0.250000
      b       0.250000
      c       0.500000
y     a       0.333333
      b       0.333333
      c       0.333333
z     a       0.500000
      b       0.500000
dtype: float64

Conditionally find percent of occurrences in field 2, given field 1 in DataFrame

Answers (1)

Related Questions