cscarlet
cscarlet

Reputation: 45

Drop columns from pandas dataframe where header contains int from a range

I have a dataframe df_alltrades with a number of similar columns where the string includes integers:

    instrument  Bid0Mkt  Bid1Mkt  Bid2Mkt  Bid3Mkt Bid4Mkt  ...
0   EURUSD      1.1      1.2      1.2      1.3     1.3
1   NZDUSD      0.6      0.65     0.7      0.9     0.92
.   .           .        .        .        .       
.   .           .        .        .        .       

I also have corresponding columns for OfferXMkt and a number of other columns formatted as str.

I want to remove any columns where X in BidXMkt and OfferXMkt is greater than 0 so that I end up with something like:

     instrument  Bid0Mkt  Offer0Mkt ...
0    EURUSD      1.1      
1    NZDUSD      0.6      
.    .           .        
.    .           .            

I can do this by writing out each if statement

for column in df_alltrades.columns:
    if '1' in column:
        df_alltrades.drop(columns=column, inplace=True)
    if '2' in column:
        df_alltrades.drop(columns=column, inplace=True)
    if '3' in column:
        df_alltrades.drop(columns=column, inplace=True)
    if '4' in column:
        df_alltrades.drop(columns=column, inplace=True)

but I was hoping to clean it up with a loop, something like:

for column in df_alltrades.columns:
    for C in range(1, 5):
        if C in column:
            df_alltrades.drop(columns=column, inplace=True)

but I'm getting the error: TypeError: 'in <string>' requires string as left operand, not int

I have also tried things like;

for X in range(1, 5):
    BidMkt = 'Bid{}Mkt'.format(X)

    df_new = df_alltrades.drop([BidMkt], axis = 1)

which even though print(BidMkt) gives the full list, only Bid4Mkt is dropped.

Thanks for any help!

Upvotes: 2

Views: 1168

Answers (3)

It_is_Chris
It_is_Chris

Reputation: 14103

Create a remove list using list comprehension and drop those columns

d = {'instrument': {0: 'EURUSD', 1: 'NZDUSD'},
 'Bid0Mkt': {0: 1.1, 1: 0.6},
 'Bid1Mkt': {0: 1.2, 1: 0.65},
 'Bid2Mkt': {0: 1.2, 1: 0.7},
 'Bid3Mkt': {0: 1.3, 1: 0.9},
 'Bid4Mkt': {0: 1.3, 1: 0.92},
 'Another': {0: 1, 1: 2}}

df = pd.DataFrame(d)

remove = [col for col in df.columns if col.startswith('Bid') and int(col[3]) > 0]
new_df = df.drop(columns=remove)

  instrument  Bid0Mkt  Another
0     EURUSD      1.1        1
1     NZDUSD      0.6        2

Upvotes: 1

bernatj
bernatj

Reputation: 147

Check what columns don't have 0 in it:

for column in df_alltrades.columns:
    if 'Bid' in column and '0' not in column:
        df_alltrades.drop(columns=column, inplace=True)

Upvotes: 1

Geom
Geom

Reputation: 1546

The error is telling you where the problem is C needs to be a string not an integer:

   for column in df_alltrades.columns:
        for C in range(1, 5):
            if str(C) in column:
                df_alltrades.drop(columns=column, inplace=True)

but if I were you I would first do this to get the headers as a list

headers = list(df.columns.values)

Filter the necessary columns from this list and then use it to slice the Dataframe like this

df = df[headers]

Upvotes: 1

Related Questions