manikawnth
manikawnth

Reputation: 3249

Pandas DataFrame - Transpose few elements of a row into columns and fill missing data

I want to reformat a dataframe by transposing some elements of rows into columns. To provide an example of what I meant.

In the below dataframe, I want all the elements in the code column to be individual columns. And the missing rows like 'JFK 10/06 XX' should be populated as 0 or nan.

Original DataFrame:

loc     date    code
---     -----   ----
LGA     10/05   XX
LGA     10/06   XX
LGA     10/07   XX
LGA     10/05   YY
LGA     10/06   YY
LGA     10/07   YY
JFK     10/05   XX
###JFK      10/06   XX (missing)
JFK     10/07   XX
JFK     10/05   YY
JFK     10/06   YY
JFK     10/07   YY

To be formatted DataFrame:

loc     date    XX  YY
---     -----   --  --
LGA     10/05   1   1
LGA     10/06   1   1
LGA     10/07   1   1
JFK     10/05   1   1
JFK     10/06   0   1
JFK     10/07   1   1

Here JFK -> 10/06 has an entry 0 in XX

I tried grouping by with the rest of the columns and able to verify the counts, but I couldn't format it in the expected way.

Upvotes: 1

Views: 2402

Answers (3)

Nickil Maveli
Nickil Maveli

Reputation: 29719

Another solution using crosstab which computes the frequency of occurence of the values present in the columns argument:

pd.crosstab(index=[df['loc'], df['date']], columns=df['code'])  \
  .reset_index(level=1)                                         \
  .sort_index(ascending=False)

Image

Note: It's not a good practice to name the columns as 'loc' which coincidentally also is a method used by pandas for performing label based location indexing.

Upvotes: 1

NickBraunagel
NickBraunagel

Reputation: 1599

  1. stack the dataframe
  2. reset index
  3. create pivot table of counts

new_df = (df.set_index(keys=['loc','date'])
          .stack()
          .reset_index()
          .pivot_table(index=['loc','date'], columns=0, fill_value=0, aggfunc='size'))

OUTPUT:

0               XX  YY
loc date              
JFK 2016-10-05   1   1
    2016-10-06   0   1
    2016-10-07   1   1
LGA 2016-10-05   1   1
    2016-10-06   1   1
    2016-10-07   1   1

Upvotes: 0

akuiper
akuiper

Reputation: 215077

You are trying to reshape your data to wide format without a value column. One option is to use pivot_table and specify the size as the aggregate function, which will count the combinations of index and columns and fill as values. Missing values can be replaced with the fill_value parameter:

df.pivot_table(index = ['loc', 'date'], columns = 'code', 
               aggfunc='size', fill_value=0).reset_index()

#code   loc  date  XX  YY
#   0   JFK 10/05   1   1
#   1   JFK 10/06   0   1
#   2   JFK 10/07   1   1
#   3   LGA 10/05   1   1
#   4   LGA 10/06   1   1
#   5   LGA 10/07   1   1

Upvotes: 4

Related Questions