Reputation: 225
I've wrote a script that collects a list of links from a web page, and then scrape a table containing data from each of the links which is all working fine.
My next job is to restructure the table into a more meaningful arrangement for my end goal. I'm not even really sure on the terminology of what I want to do so I'll try and explain using an example.
I have 50 odd files that has been created using my aforementioned exercise that look like this:
Identifier, Data1, Data2, Data3
abc,x,x,x
def,x,x,x
ghi,x,x,x
I want each data header to become part of the table alongside it's value, like this:
Identifier, Measure, Value
abc, Data1, x
def, Data1, x
ghi, Data1, x
abc, Data2, x
def, Data2, x
ghi, Data2, x
abc, Data3, x
def, Data3, x
ghi, Data3, x
abc, Data4, x
def, Data4, x
ghi, Data4, x
This is pretty simple to do manually, but when working with 30-50 files it can be a little tedious (and it's a task I have to do every few months). I'm not sure it would be quicker to merge my sheets together (but some sheets contain additional headers), perform this task individually and then merge them after or even structure the data how I want it when scraping the data.
Hopefully this makes sense. Any help would be appreciated and I'll be happy to clarify anything! Thanks.
Upvotes: 0
Views: 72
Reputation: 77941
You can use pandas.melt
:
>>> df
Identifier Data1 Data2 Data3
0 abc 1 4 7
1 def 2 5 8
2 ghi 3 6 9
>>> pd.melt(df, id_vars='Identifier',
... value_vars=['Data1', 'Data2', 'Data3'],
... var_name='Measure', value_name='Value')
Identifier Measure Value
0 abc Data1 1
1 def Data1 2
2 ghi Data1 3
3 abc Data2 4
4 def Data2 5
5 ghi Data2 6
6 abc Data3 7
7 def Data3 8
8 ghi Data3 9
Upvotes: 2