Filippo Sebastio
Filippo Sebastio

Reputation: 1112

Iterate over columns to slice dataset

I have the following dataset: Columns named: 2,3,4...9 are filled with topic names that overlap with each other. Pageviews is an outcome variable.

        2                           3                       Pageviews
0       Financial Services          Consumer Products       4106.0
1       Consumer Products           ...                     3368.0
2       Consumer Products           ...                     1025.0
3       Collaboration               ...                     7840.0
4       Future of Supply Chains     ...                     2076.0

I would like to slice each topic column (2,3, 4, ...) together with Pageviews and append them so as to create only one dataframe with 1 topic column and Pageviews.

I am used to looping in Stata where you could loop through the name of the columns using x, but I understand it is totally different with Pyhton.

I started with

for x in range(2, 9):
    df_x = df[['Pageviews',  df.x]]

but Python does not recognize df.x

How do you loop through column names? And is it possible to use the iterator to create new dataframes?

Thanks!

EDIT

My desired output is

                                       Col        Pageviews
0                           Financial Services      4106.0
1                            Consumer Products      3368.0
2                            Consumer Products      1025.0
3                                 Collaboration     7840.0
4                      Future of Supply Chains      2076.0
5                          Future of Reporting      2123.0
6                    Sustainability Management     15576.0
7                                 Human Rights        52.0
8                                      BSR News      903.0
9                       Energy and Extractives      1232.0
10                                  HERproject       616.0
11                   Sustainability Management     10697.0

where col is the result of appending columns 2, 3, 4... and Pageviews is the result of appending the respective Pageviews columns..

Upvotes: 2

Views: 481

Answers (2)

BENY
BENY

Reputation: 323326

Using melt

df.melt('Pageviews').drop('variable',1)
Out[644]: 
    Pageviews                 value
0        1210      ConsumerProducts
1        1528         Collaboration
2        1716     FinancialServices
3        1403         Collaboration
4        1090      ConsumerProducts
5        1210      ConsumerProducts
6        1528  FutureofSupplyChains
7        1716      ConsumerProducts
8        1403     FinancialServices
9        1090  FutureofSupplyChains
10       1210     FinancialServices
11       1528     FinancialServices
12       1716         Collaboration
13       1403  FutureofSupplyChains
14       1090     FinancialServices

Upvotes: 1

sacuL
sacuL

Reputation: 51395

I think you are looking for some sort of stack method rather than iteration (generally, when working with dataframes, iteration is a last resort, as there are usually vectorized methods to achieve most data reorganization tasks).

Take the example dataframe:

>>> df
                    0                        1                        2  \
0   Consumer Products        Consumer Products       Financial Services   
1       Collaboration  Future of Supply Chains       Financial Services   
2  Financial Services        Consumer Products            Collaboration   
3       Collaboration       Financial Services  Future of Supply Chains   
4   Consumer Products  Future of Supply Chains       Financial Services   

   Pageviews  
0       1210  
1       1528  
2       1716  
3       1403  
4       1090  

You can do the following:

new_df = (df.set_index('Pageviews')
          .stack()
          .reset_index(0))

>>> new_df
    Pageviews                        0
0        1210        Consumer Products
1        1210        Consumer Products
2        1210       Financial Services
3        1528            Collaboration
4        1528  Future of Supply Chains
5        1528       Financial Services
6        1716       Financial Services
7        1716        Consumer Products
8        1716            Collaboration
9        1403            Collaboration
10       1403       Financial Services
11       1403  Future of Supply Chains
12       1090        Consumer Products
13       1090  Future of Supply Chains
14       1090       Financial Services

Upvotes: 0

Related Questions