Reputation: 57
I am trying to convert this dataframe:
data_in = {
'dates': [2017, 2017, 2018, 2019, 2019, 2019],
'names': ['Roger', 'Rafa', 'Roger', 'Rafa', 'Novak', 'Dom']
}
df_in = pd.DataFrame(data_in)
>>> df_in
dates names
0 2017 Roger
1 2017 Rafa
2 2018 Roger
3 2019 Rafa
4 2019 Novak
5 2019 Dom
into this binary matrix:
>>> df_out
Roger Rafa Novak Dom
dates
2017 1 1 0 0
2018 1 0 0 0
2019 0 1 1 1
with all the dates as index, all the names as columns, and the data being 1 if occurence of the name at the date, and np.NaN or 0 if not.
I can build the df_out dataframe with its index and columns, but how would you get the data?
Upvotes: 3
Views: 128
Reputation: 863291
Use get_dummies
with max
for 1
if exist values else 0
:
df = pd.get_dummies(df_in.set_index('dates')['names']).groupby(level=0).max()
print (df)
Dom Novak Rafa Roger
dates
2017 0 0 1 1
2018 0 0 0 1
2019 1 1 1 0
Upvotes: 5
Reputation: 61920
Use crosstab:
res = pd.crosstab(df_in['dates'], df_in['names'])
print(res)
Output
names Dom Novak Rafa Roger
dates
2017 0 0 1 1
2018 0 0 0 1
2019 1 1 1 0
If only need indicator function, do:
res = pd.crosstab(df_in['dates'], df_in['names']).gt(0).astype(int)
print(res)
Output
names Dom Novak Rafa Roger
dates
2017 0 0 1 1
2018 0 0 0 1
2019 1 1 1 0
Upvotes: 4