Python change row values from DataFrame based on condition

Below is my DataFrame:

df = pd.DataFrame({'University':['4', '0', '3'],'Subject':['1', '2', '0']})

Just like 'University' & 'Subject' I have 50 other columns that end with the

column 'Score':['3', '1', '0'].

I would like to change the values for all 50 columns starting from the University all the way to Score. The condition should be if a value is => 1 then change them to 1 otherwise all other values should be zero.

I know that we are able to do it with this method:

df.loc[dt['University'] > 1] = 1

However, above I will have to type each and every one of my 50 columns. Is there a way where I could specifically mention starting from 'University' and ending at 'Score' which should include all the columns in between?

I am new to python so please try to be as clear as possible. Thank you.

Upvotes: 1

Answers (3)

Nour-Allah Hussein

Reputation: 1459

for any number of columns except the last one score

for i in df.columns[:-1]:
    df[i]=df[i].apply(lambda x: 1 if int(x)>=1 else 0)

Upvotes: 0

LazyEval

Reputation: 859

This could work:

df_new = pd.DataFrame(np.where(df.values >= 1, 1, 0), columns=df.columns)

EDIT: This is not very clean but it might do the job:

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(lambda x: (x >= 1).astype('int'))

EDIT2: For more conditions I would do something like this, but again I don't think this is optimal:

def conditions(row):
    mask1 = (row.values >= 1) & (row.values < 5)
    mask2 = (row.values >= 5) & (row.values < 10)
    mask3 = (row.values >= 10)
    
    row = mask1.astype('int') + mask2.astype('int')*2 + mask3.astype('int')*3
    return row

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(conditions)

Apply is quite slow in general, and though it might not be an issue for your case, I would probably opt to select all columns of the df and drop the ones you don't want to include in the operation and use .loc (instead of trying to get the range of columns).

Upvotes: 1

vladsiv

Reputation: 2946

I think this could help you:

df = pd.DataFrame(
    {
        "University": ["4", "0", "3"],
        "Subject": ["1", "2", "0"],
        "Test1": ["1", "2", "0"],
        # Other columns
        "Score": ["3", "0", "4"],
    }
)

# Change the type, since you need int
df = df.astype(int)

for column in df.columns[:-1]:
    df.loc[df[column] > 1, column] = 1
print(df.head())

Which gives:

   University  Subject  Test1  Score
0           1        1      1      3
1           0        1      1      0
2           1        0      0      4

I believe that's what you want if I understood you correctly.

Upvotes: 1

Python change row values from DataFrame based on condition

Answers (3)

Related Questions