Kimberly Clemons
Kimberly Clemons

Reputation: 55

Stack pandas dataframe

I have a dataframe that I need to stack, melt or unpivot. For each school, I need a new row for each capacity as well as a new column for levels. The levels depends on if the capacity is >0. The city column should be included as well:

data = pd.DataFrame({'school_name': {0: 'a', 1: 'b', 2: 'c'},
                     'primary': {0: 1, 1: 3, 2: 0},
                     'secondary': {0: 2, 1: 0, 2: 6},
                     'tertiary': {0:3, 1:6, 2:0},
                     'city': {0:'Bangkok', 1:'Frankfurt', 2:'Tel Aviv'}})
data

  school_name  primary  secondary  tertiary       city
0           a        1          2         3    Bangkok
1           b        3          0         6  Frankfurt
2           c        0          6         0   Tel Aviv

Desired result:

    school_name levels     capacity    city
0   a           primary     1          Bangkok
1   a           secondary   2          Bangkok
2   a           tertiary    3          Bangkok
3   b           primary     3          Frankfurt
4   b           tertiary    6          Frankfurt
5   c           secondary   6          Tel Aviv

Upvotes: 2

Views: 186

Answers (4)

JeB
JeB

Reputation: 147

I would replace 0 value with nan since I understood it means that feature is not present

data2 = data.replace(0, np.nan)

then a melt i think is what you need. (and remove nan values)

data2.melt(id_vars= ['school_name', 'city'], value_vars=['primary', 'secondary', 'tertiary']).dropna()

   school_name  city        variable    value
0   a           Bangkok     primary     1.0
1   b           Frankfurt   primary     3.0
3   a           Bangkok     secondary   2.0
5   c           Tel Aviv    secondary   6.0
6   a           Bangkok     tertiary    3.0
7   b           Frankfurt   tertiary    6.0

if you don't like indexes reset them

data2.melt(id_vars= ['school_name', 'city'], value_vars=['primary', 'secondary', 'tertiary']).dropna().reset_index()

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28644

You could use pivot_longer from pyjanitor to reshape the data before filtering for rows greater than 0:

 import janitor

(df
 .pivot_longer(index=['school_name', 'city'], 
               names_to=("levels", ".value"), 
               names_sep="_")
 .query("capacity > 0")
 )


  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           b  Frankfurt    primary         3
3           a    Bangkok  secondary         2
5           c   Tel Aviv  secondary         6
6           a    Bangkok   tertiary         3
7           b  Frankfurt   tertiary         6

You can also use pandas' wide_to_long:

temp = df.rename(columns= lambda x: "_".join(x.split("_")[::-1])
                          if 'capacity' in x 
                          else x)

(pd.wide_to_long(temp, 
                 'capacity', 
                 ['school_name', 'city'], 
                 "levels", 
                  "_",
                  ".+")
   .query('capacity > 0').reset_index()
  )

  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           a    Bangkok  secondary         2
2           a    Bangkok   tertiary         3
3           b  Frankfurt    primary         3
4           b  Frankfurt   tertiary         6
5           c   Tel Aviv  secondary         6

Your question has been edited, so this answer should suffice:

(data.melt(['school_name', 'city'], 
           var_name='levels', 
           value_name='capacity')
    .query('capacity > 0'))

  school_name       city     levels  capacity
0           a    Bangkok    primary         1
1           b  Frankfurt    primary         3
3           a    Bangkok  secondary         2
5           c   Tel Aviv  secondary         6
6           a    Bangkok   tertiary         3
7           b  Frankfurt   tertiary         6

Upvotes: 2

Rob Raymond
Rob Raymond

Reputation: 31156

# keep columns in index
data = data.set_index(["school_name","city"])
# remaining columns, name the index
data.columns.set_names("levels",inplace=True)
# simple stack and reset index
data.stack().reset_index()

school_name city levels 0
0 a Bangkok primary 1
1 a Bangkok secondary 2
2 a Bangkok tertiary 3
3 b Frankfurt primary 3
4 b Frankfurt secondary 0
5 b Frankfurt tertiary 6
6 c Tel Aviv primary 0
7 c Tel Aviv secondary 6
8 c Tel Aviv tertiary 0

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71689

Let us stack to reshape the dataframe after masking the values which are 0 in the primary, secondary and tertiary columns:

df = data.set_index(['school_name', 'city'])
df = df[df.ne(0)].stack().reset_index(name='capacity')\
                 .rename(columns={'level_2': 'levels'})

>>> df

  school_name       city     levels   capacity
0           a    Bangkok    primary       1.0
1           a    Bangkok  secondary       2.0
2           a    Bangkok   tertiary       3.0
3           b  Frankfurt    primary       3.0
4           b  Frankfurt   tertiary       6.0
5           c   Tel Aviv  secondary       6.0

Upvotes: 2

Related Questions