ashleh
ashleh

Reputation: 225

Re-structuring data in a csv file

I've wrote a script that collects a list of links from a web page, and then scrape a table containing data from each of the links which is all working fine.

My next job is to restructure the table into a more meaningful arrangement for my end goal. I'm not even really sure on the terminology of what I want to do so I'll try and explain using an example.

I have 50 odd files that has been created using my aforementioned exercise that look like this:

            Identifier, Data1, Data2, Data3
            abc,x,x,x
            def,x,x,x
            ghi,x,x,x

I want each data header to become part of the table alongside it's value, like this:

            Identifier, Measure, Value
            abc, Data1, x
            def, Data1, x
            ghi, Data1, x               
            abc, Data2, x
            def, Data2, x
            ghi, Data2, x               
            abc, Data3, x
            def, Data3, x
            ghi, Data3, x               
            abc, Data4, x
            def, Data4, x
            ghi, Data4, x               

This is pretty simple to do manually, but when working with 30-50 files it can be a little tedious (and it's a task I have to do every few months). I'm not sure it would be quicker to merge my sheets together (but some sheets contain additional headers), perform this task individually and then merge them after or even structure the data how I want it when scraping the data.

Hopefully this makes sense. Any help would be appreciated and I'll be happy to clarify anything! Thanks.

Upvotes: 0

Views: 72

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 77941

You can use pandas.melt:

>>> df
  Identifier  Data1  Data2  Data3
0        abc      1      4      7
1        def      2      5      8
2        ghi      3      6      9

>>> pd.melt(df, id_vars='Identifier', 
...         value_vars=['Data1', 'Data2', 'Data3'], 
...         var_name='Measure', value_name='Value')

  Identifier Measure  Value
0        abc   Data1      1
1        def   Data1      2
2        ghi   Data1      3
3        abc   Data2      4
4        def   Data2      5
5        ghi   Data2      6
6        abc   Data3      7
7        def   Data3      8
8        ghi   Data3      9

Upvotes: 2

Related Questions