Reputation: 1112
I have the following dataset: Columns named: 2,3,4...9 are filled with topic names that overlap with each other. Pageviews is an outcome variable.
2 3 Pageviews
0 Financial Services Consumer Products 4106.0
1 Consumer Products ... 3368.0
2 Consumer Products ... 1025.0
3 Collaboration ... 7840.0
4 Future of Supply Chains ... 2076.0
I would like to slice each topic column (2,3, 4, ...) together with Pageviews
and append them so as to create only one dataframe with 1 topic column and Pageviews
.
I am used to looping in Stata where you could loop through the name of the columns using x
, but I understand it is totally different with Pyhton.
I started with
for x in range(2, 9):
df_x = df[['Pageviews', df.x]]
but Python does not recognize df.x
How do you loop through column names? And is it possible to use the iterator to create new dataframes?
Thanks!
EDIT
My desired output is
Col Pageviews
0 Financial Services 4106.0
1 Consumer Products 3368.0
2 Consumer Products 1025.0
3 Collaboration 7840.0
4 Future of Supply Chains 2076.0
5 Future of Reporting 2123.0
6 Sustainability Management 15576.0
7 Human Rights 52.0
8 BSR News 903.0
9 Energy and Extractives 1232.0
10 HERproject 616.0
11 Sustainability Management 10697.0
where col is the result of appending columns 2, 3, 4... and Pageviews is the result of appending the respective Pageviews columns..
Upvotes: 2
Views: 481
Reputation: 323326
Using melt
df.melt('Pageviews').drop('variable',1)
Out[644]:
Pageviews value
0 1210 ConsumerProducts
1 1528 Collaboration
2 1716 FinancialServices
3 1403 Collaboration
4 1090 ConsumerProducts
5 1210 ConsumerProducts
6 1528 FutureofSupplyChains
7 1716 ConsumerProducts
8 1403 FinancialServices
9 1090 FutureofSupplyChains
10 1210 FinancialServices
11 1528 FinancialServices
12 1716 Collaboration
13 1403 FutureofSupplyChains
14 1090 FinancialServices
Upvotes: 1
Reputation: 51395
I think you are looking for some sort of stack
method rather than iteration (generally, when working with dataframes, iteration is a last resort, as there are usually vectorized methods to achieve most data reorganization tasks).
Take the example dataframe:
>>> df
0 1 2 \
0 Consumer Products Consumer Products Financial Services
1 Collaboration Future of Supply Chains Financial Services
2 Financial Services Consumer Products Collaboration
3 Collaboration Financial Services Future of Supply Chains
4 Consumer Products Future of Supply Chains Financial Services
Pageviews
0 1210
1 1528
2 1716
3 1403
4 1090
You can do the following:
new_df = (df.set_index('Pageviews')
.stack()
.reset_index(0))
>>> new_df
Pageviews 0
0 1210 Consumer Products
1 1210 Consumer Products
2 1210 Financial Services
3 1528 Collaboration
4 1528 Future of Supply Chains
5 1528 Financial Services
6 1716 Financial Services
7 1716 Consumer Products
8 1716 Collaboration
9 1403 Collaboration
10 1403 Financial Services
11 1403 Future of Supply Chains
12 1090 Consumer Products
13 1090 Future of Supply Chains
14 1090 Financial Services
Upvotes: 0