Reputation: 431
I have a dataset as such -
and I want to reshape the dataframe for analysis as such-
I went through the documentation of pandas for reshaping the dataframes and found 2 functions
stack()
and melt()
but the documentation doesn't seem enough for what I intend to do .I need help .
Here is the code I tried ,
data.melt(id_vars=['country','Lat','Long_'])
# I want the columns: country,Lat,Long as it is and melt
This is the error I get:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-22-68d10347ddd7> in <module>
1 data.melt(id_vars=['country','Lat','Long_'])
----> 2 data.stack(1)
~\anaconda3\lib\site-packages\pandas\core\frame.py in stack(self, level, dropna)
6999 return stack_multiple(self, level, dropna=dropna)
7000 else:
-> 7001 return stack(self, level, dropna=dropna)
7002
7003 def explode(
~\anaconda3\lib\site-packages\pandas\core\reshape\reshape.py in stack(frame, level, dropna)
489
490 # Will also convert negative level numbers and check if out of bounds.
--> 491 level_num = frame.columns._get_level_number(level)
492
493 if isinstance(frame.columns, MultiIndex):
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _get_level_number(self, level)
1413
1414 def _get_level_number(self, level) -> int:
-> 1415 self._validate_index_level(level)
1416 return 0
1417
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _validate_index_level(self, level)
1404 )
1405 elif level > 0:
-> 1406 raise IndexError(
1407 f"Too many levels: Index has only 1 level, not {level + 1}"
1408 )
IndexError: Too many levels: Index has only 1 level, not 2
Can you please provide me a proper function do it and get the desired result ? Thanks in advance
Upvotes: 2
Views: 2557
Reputation: 862581
I think DataFrame.melt
is good idea here, only add another parameters for filter columns names and set new columns names:
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/cases_country.csv'
data = pd.read_csv(url)
df = data.melt(id_vars=['Country_Region','Lat','Long_'],
value_vars=['Confirmed','Deaths','Recovered','Active'],
var_name='Study',
value_name='Count')
print (df)
Country_Region Lat Long_ Study Count
0 Afghanistan 33.939110 67.709953 Confirmed 50190.0
1 Albania 41.153300 20.168300 Confirmed 53814.0
2 Algeria 28.033900 1.659600 Confirmed 96069.0
3 Andorra 42.506300 1.521800 Confirmed 7633.0
4 Angola -11.202700 17.873900 Confirmed 16802.0
.. ... ... ... ... ...
759 Vietnam 14.058324 108.277199 Active 104.0
760 West Bank and Gaza 31.952200 35.233200 Active 22953.0
761 Yemen 15.552727 48.516388 Active 97.0
762 Zambia -13.133897 27.849332 Active 612.0
763 Zimbabwe -19.015438 29.154857 Active 2059.0
[764 rows x 5 columns]
And another solution with DataFrame.stack
looks like:
df = (data.set_index(['Country_Region','Lat','Long_'])[['Confirmed','Deaths','Recovered','Active']]
.rename_axis('Study', axis=1)
.stack()
.reset_index(name='Count'))
print (df)
Country_Region Lat Long_ Study Count
0 Afghanistan 33.939110 67.709953 Confirmed 50190.0
1 Afghanistan 33.939110 67.709953 Deaths 2096.0
2 Afghanistan 33.939110 67.709953 Recovered 39585.0
3 Afghanistan 33.939110 67.709953 Active 8509.0
4 Albania 41.153300 20.168300 Confirmed 53814.0
.. ... ... ... ... ...
755 Zambia -13.133897 27.849332 Active 612.0
756 Zimbabwe -19.015438 29.154857 Confirmed 12544.0
757 Zimbabwe -19.015438 29.154857 Deaths 326.0
758 Zimbabwe -19.015438 29.154857 Recovered 10159.0
759 Zimbabwe -19.015438 29.154857 Active 2059.0
[760 rows x 5 columns]
Upvotes: 6