Reputation: 875
I am running Windows 10, Python 2.7 through the Spyder IDE.
I have a pandas DataFrame
called df
:
df = pd.DataFrame({'fld1': ['x', 'x', 'x','x','y','y','y','z','z']
, 'fld2': ['a', 'b', 'c','c','a','b','c','a','b']})
>>> df
fld1 fld2
0 x a
1 x b
2 x c
3 x c
4 y a
5 y b
6 y c
7 z a
8 z b
I would like to calculate the percent of fld2
that make up fld1
and save that percentage in fld3
so that the product is unique combinations of fld1
and fld2
. The product of this code should look like df2
:
df2 = pd.DataFrame({'fld1': ['x', 'x', 'x','y','y','y','z','z']
, 'fld2': ['a', 'b', 'c','a','b','c','a','b']
, 'fld3': [.25,.25,.50,.33,.33,.33,.5,.5]})
>>> df2
fld1 fld2 fld3
0 x a 0.25
1 x b 0.25
2 x c 0.50
3 y a 0.33
4 y b 0.33
5 y c 0.33
6 z a 0.50
7 z b 0.50
Upvotes: 2
Views: 68
Reputation: 862771
You can use groupby
, size
and divide by sums created bytransform
:
print df
fld1 fld2
0 x a
1 x b
2 x c
3 x c
4 y a
5 y b
6 y c
7 z a
8 z b
g = df.groupby(['fld1', 'fld2'])['fld1'].size()
print g
fld1 fld2
x a 1
b 1
c 2
y a 1
b 1
c 1
z a 1
b 1
dtype: int64
print g / g.groupby(level=0).transform(sum)
fld1 fld2
x a 0.250000
b 0.250000
c 0.500000
y a 0.333333
b 0.333333
c 0.333333
z a 0.500000
b 0.500000
dtype: float64
Upvotes: 2