Efficient way to reshape a pandas dataframe

Question

I have a data frame that looks like this:

com_id    region_id       represent_name  
2233        r1               A  
2233        r2               B  
2233        r4               C  
2234        r3               D  
2234        r1               E  
2235        r1               F

There are only four possible values in region_id. For each company, there might be one representative in some region. What I want is something like this:

com_id  region1 region2 region3 region4  
2233        A      B      NaN     C  
2234        E     NaN      D     NaN  
2235        F     NaN     NaN    NaN

I have tried something like pivot table or unstack in pandas, but it is not working in this case. The problem is that each company might or might now have a representative for one region, in other words, there may be missing values in region_id for each company.

My idea so far is to create a new data frame like this:

new = pd.DataFrame(index=np.unique(df['com_id']),
                    columns=np.unique(data['region_id']))

but I am not sure how to go further.

Thanks!

JohnE · Accepted Answer

You can do it with unstack() as you attempted, but you need to set the index first:

df.set_index(['com_id','region_id']).unstack()

Out[1662]: 
          rep_name               
region_id       r1   r2   r3   r4
com_id                           
2233             A    B  NaN    C
2234             E  NaN    D  NaN

Efficient way to reshape a pandas dataframe

Answers (1)

Related Questions