Reputation: 383
I have the following data:
sentences = [{'mary':'N', 'jane':'N', 'can':'M', 'see':'V','will':'N'},
{'spot':'N','will':'M','see':'V','mary':'N'},
{'will':'M','jane':'N','spot':'V','mary':'N'},
{'mary':'N','will':'M','pat':'V','spot':'N'}]
I want to create a data frame where each key (from the pairs above) will be the column name and each value (from above) will be the index of the row. The values in the data frame will be counting of each matching point between the key and the value.
The expected result should be:
df = pd.DataFrame([(4,0,0),
(2,0,0),
(0,1,0),
(0,0,2),
(1,3,0),
(2,0,1),
(0,0,1)],
index=['mary', 'jane', 'can', 'see', 'will', 'spot', 'pat'],
columns=('N','M','V'))
Upvotes: 1
Views: 47
Reputation: 862921
Use value_counts
per columns in DataFrame.apply
, replace missing values, convert to integers and last transpose by DataFrame.T
:
df = df.apply(pd.value_counts).fillna(0).astype(int).T
print (df)
M N V
mary 0 3 1
jane 0 2 0
can 1 0 0
see 0 0 2
will 3 1 0
spot 0 2 1
pat 0 0 1
Or use DataFrame.stack
with SeriesGroupBy.value_counts
and Series.unstack
:
df = df.stack().groupby(level=1).value_counts().unstack(fill_value=0)
print (df)
M N V
can 1 0 0
jane 0 2 0
mary 0 3 1
pat 0 0 1
see 0 0 2
spot 0 2 1
will 3 1 0
Upvotes: 3
Reputation: 11927
pd.DataFrame(sentences).T.stack().groupby(level=0).value_counts().unstack().fillna(0)
M N V
can 1.0 0.0 0.0
jane 0.0 2.0 0.0
mary 0.0 3.0 1.0
pat 0.0 0.0 1.0
see 0.0 0.0 2.0
spot 0.0 2.0 1.0
will 3.0 1.0 0.0
Cast as int if needed to.
pd.DataFrame(sentences).T.stack().groupby(level=0).value_counts().unstack().fillna(0).cast("int")
Upvotes: 2