Reputation: 3194
I have a data frame that looks like this:
com_id region_id represent_name
2233 r1 A
2233 r2 B
2233 r4 C
2234 r3 D
2234 r1 E
2235 r1 F
There are only four possible values in region_id
. For each company, there might be one representative in some region. What I want is something like this:
com_id region1 region2 region3 region4
2233 A B NaN C
2234 E NaN D NaN
2235 F NaN NaN NaN
I have tried something like pivot table or unstack in pandas, but it is not working in this case. The problem is that each company might or might now have a representative for one region, in other words, there may be missing values in region_id
for each company.
My idea so far is to create a new data frame like this:
new = pd.DataFrame(index=np.unique(df['com_id']),
columns=np.unique(data['region_id']))
but I am not sure how to go further.
Thanks!
Upvotes: 1
Views: 705
Reputation: 30444
You can do it with unstack() as you attempted, but you need to set the index first:
df.set_index(['com_id','region_id']).unstack()
Out[1662]:
rep_name
region_id r1 r2 r3 r4
com_id
2233 A B NaN C
2234 E NaN D NaN
Upvotes: 1