How to use melt() and Stack() in pandas

Question

I have a dataset as such -

and I want to reshape the dataframe for analysis as such-

I went through the documentation of pandas for reshaping the dataframes and found 2 functions stack() and melt() but the documentation doesn't seem enough for what I intend to do .I need help .

Stack

Melt

Here is the code I tried ,

data.melt(id_vars=['country','Lat','Long_'])
 # I want the columns: country,Lat,Long as it is and melt

This is the error I get:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
 in 
      1 data.melt(id_vars=['country','Lat','Long_'])
----> 2 data.stack(1)

~\anaconda3\lib\site-packages\pandas\core\frame.py in stack(self, level, dropna)
   6999             return stack_multiple(self, level, dropna=dropna)
   7000         else:
-> 7001             return stack(self, level, dropna=dropna)
   7002 
   7003     def explode(

~\anaconda3\lib\site-packages\pandas\core\reshape\reshape.py in stack(frame, level, dropna)
    489 
    490     # Will also convert negative level numbers and check if out of bounds.
--> 491     level_num = frame.columns._get_level_number(level)
    492 
    493     if isinstance(frame.columns, MultiIndex):

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _get_level_number(self, level)
   1413 
   1414     def _get_level_number(self, level) -> int:
-> 1415         self._validate_index_level(level)
   1416         return 0
   1417 

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _validate_index_level(self, level)
   1404                 )
   1405             elif level > 0:
-> 1406                 raise IndexError(
   1407                     f"Too many levels: Index has only 1 level, not {level + 1}"
   1408                 )

IndexError: Too many levels: Index has only 1 level, not 2

Can you please provide me a proper function do it and get the desired result ? Thanks in advance

jezrael · Accepted Answer

I think DataFrame.melt is good idea here, only add another parameters for filter columns names and set new columns names:

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv'
data = pd.read_csv(url)

df = data.melt(id_vars=['Country_Region','Lat','Long_'], 
               value_vars=['Confirmed','Deaths','Recovered','Active'],
               var_name='Study',
               value_name='Count')
print (df)
         Country_Region        Lat       Long_      Study    Count
0           Afghanistan  33.939110   67.709953  Confirmed  50190.0
1               Albania  41.153300   20.168300  Confirmed  53814.0
2               Algeria  28.033900    1.659600  Confirmed  96069.0
3               Andorra  42.506300    1.521800  Confirmed   7633.0
4                Angola -11.202700   17.873900  Confirmed  16802.0
..                  ...        ...         ...        ...      ...
759             Vietnam  14.058324  108.277199     Active    104.0
760  West Bank and Gaza  31.952200   35.233200     Active  22953.0
761               Yemen  15.552727   48.516388     Active     97.0
762              Zambia -13.133897   27.849332     Active    612.0
763            Zimbabwe -19.015438   29.154857     Active   2059.0

[764 rows x 5 columns]

And another solution with DataFrame.stack looks like:

df = (data.set_index(['Country_Region','Lat','Long_'])[['Confirmed','Deaths','Recovered','Active']]
          .rename_axis('Study', axis=1)
          .stack()
          .reset_index(name='Count'))
print (df)

    Country_Region        Lat      Long_      Study    Count
0      Afghanistan  33.939110  67.709953  Confirmed  50190.0
1      Afghanistan  33.939110  67.709953     Deaths   2096.0
2      Afghanistan  33.939110  67.709953  Recovered  39585.0
3      Afghanistan  33.939110  67.709953     Active   8509.0
4          Albania  41.153300  20.168300  Confirmed  53814.0
..             ...        ...        ...        ...      ...
755         Zambia -13.133897  27.849332     Active    612.0
756       Zimbabwe -19.015438  29.154857  Confirmed  12544.0
757       Zimbabwe -19.015438  29.154857     Deaths    326.0
758       Zimbabwe -19.015438  29.154857  Recovered  10159.0
759       Zimbabwe -19.015438  29.154857     Active   2059.0

[760 rows x 5 columns]

How to use melt() and Stack() in pandas

Answers (1)

Related Questions