Eli Smith
Eli Smith

Reputation: 119

extract certain columns from existing data frame using for loop in python

I have data frame with more than 500 columns so from there I want to create new data frame by extracting certain columns.

I want to extract 'Time', 'GA01010_01', 'GA01020_01', 'GA01030_01', 'GA01040_01', 'GA01050_01', 'GA01060_01', 'GA01070_01', 'GA01080_01' from data frame. With the code I've got it only returns two columns where it meant to return 8 or 5 column based on cylinder count.

Here is the code I've got so far

engineCount = 4
cylinderCount = [8,5,5,8]
for no in range(engineCount):
    for i in range(len(cylinderCount)):
        tagNo = "10"+str(i+1)+"0"
        tagStr = "GA0"+str(tagNo)
   engineNo = str(no+1)
   df1 = df[['Time', str(tagStr)+'_0'+str(engineNo)]]

when I run above code, I get result as below

  Time       GA01040_04
2021-11-01         72
2021-11-02         58
2021-11-03         66
2021-11-04         73
2021-11-05         52

but expected output is as below

    Time       GA01040_01  GA01020_01  GA01030_01 ..... GA01080_01
2021-11-01         72         55           33     .....     22
2021-11-02         58         35           27     .....     35  
2021-11-03         66         36           66     .....     77
2021-11-04         73         78           65     .....     66
2021-11-05         52         63           35     .....     68 

Upvotes: 0

Views: 304

Answers (1)

user7864386
user7864386

Reputation:

You have two problems here.

(i) You never save the column names to a list, so the output you get is produced by the last column name you generate in the loop, which is GA01040_04.

(ii) The loop never produces your desired column names in the first place.

Try the following:

cylinderCount = [8,5,5,8]
first, second, third, fourth = [['GA010{}0_0{}'.format(i, cyl_idx) for i in range(1, cyl+1)] for cyl_idx, cyl in enumerate(cylinderCount, 1)]

What the above list comprehension is doing is, it's iterating over cylinderCount (in the outer loop) and depending on the count, it's creating a list of cylinder count number of column names. So engine 1 will have 8 column names, engine 2 will have 5, etc.

Then for example, to get the data of the first engine, you use

df1 = df[['Time'] + first]

To get the data of the second engine,

df2 = df[['Time'] + second]

etc.

Upvotes: 1

Related Questions