Msquare
Msquare

Reputation: 373

My dataframe has many (192) columns. How to select two columns at time?

My dataframe is like df.columns= ['Time1','Pmpp1','Time2',..........,'Pmpp96'] I want to select two successive columns at a time. Example, Time1,Pmpp1 at a time. My code is:

for i,j in zip(df.columns,df.columns[1:]):
    print(i,j)

My present output is:

 Time1 Pmmp1
 Pmmp1 Time2
 Time2 Pmpp2

Expected output is:

 Time1 Pmmp1
 Time2 Pmpp2
 Time3 Pmpp3 

Upvotes: 3

Views: 143

Answers (4)

Msquare
Msquare

Reputation: 373

After a series of trials, I got it. My code is given below:

for a in range(0,len(df.columns),2):
    print(df.columns[a],df.columns[a+1]) 

My output is:

DateTime   A016.Pmp_ref
DateTime.1 A024.Pmp_ref
DateTime.2 A040.Pmp_ref
DateTime.3 A048.Pmp_ref
DateTime.4 A056.Pmp_ref
DateTime.5 A064.Pmp_ref
DateTime.6 A072.Pmp_ref
DateTime.7 A080.Pmp_ref
DateTime.8 A096.Pmp_ref
DateTime.9 A120.Pmp_ref
DateTime.10 A124.Pmp_ref
DateTime.11 A128.Pmp_ref

Upvotes: 0

jpp
jpp

Reputation: 164773

As an alternative to integer positional slicing, you can use str.startswith to create 2 index objects. Then use zip to iterate over them pairwise:

df = pd.DataFrame(columns=['Time1', 'Pmpp1', 'Time2', 'Pmpp2', 'Time3', 'Pmpp3'])

times = df.columns[df.columns.str.startswith('Time')]
pmpps = df.columns[df.columns.str.startswith('Pmpp')]

for i, j in zip(times, pmpps):
    print(i, j)

Time1 Pmpp1
Time2 Pmpp2
Time3 Pmpp3

Upvotes: 1

jfbeltran
jfbeltran

Reputation: 1818

In this kind of scenario, it might make sense to reshape your DataFrame. So instead of selecting two columns at a time, you have a DataFrame with the two columns that ultimately represent your measurements.

First, you make a list of DataFrames, where each one only has a Time and Pmpp column:

dfs = []
for i in range(1,97):
    tmp = df[['Time{0}'.format(i),'Pmpp{0}'.format(i)]]
    tmp.columns = ['Time', 'Pmpp']  # Standardize column names
    tmp['n'] = i                    # Remember measurement number
    dfs.append(tmp)                 # Keep with our cleaned dataframes 

And then you can join them together into a new DataFrame. That has three columns.

new_df = pd.concat(dfs, ignore_index=True, sort=False)

This should be a much more manageable shape for your data.

>>> new_df.columns
[n, Time, Pmpp]

Now you can iterate through the rows in this DataFrame and get the values for your expected output

for i, row in new_df.iterrows():
    print(i, row.n, row.Time, row.Psmpp)

It also will make it easier to use the rest of pandas to analyze your data.

new_df.Pmpp.mean()
new_df.describe()

Upvotes: 0

Yaniv Oliver
Yaniv Oliver

Reputation: 3971

You're zipping on the list, and the same list starting from the second element, which is not what you want. You want to zip on the uneven and even indices of your list. For example, you could replace your code with:

for i, j in zip(df.columns[::2], df.columns[1::2]): print(i, j)

Upvotes: 5

Related Questions