Ammar Kamran
Ammar Kamran

Reputation: 139

Python change row values from DataFrame based on condition

Below is my DataFrame:

df = pd.DataFrame({'University':['4', '0', '3'],'Subject':['1', '2', '0']})

Just like 'University' & 'Subject' I have 50 other columns that end with the

column 'Score':['3', '1', '0'].

I would like to change the values for all 50 columns starting from the University all the way to Score. The condition should be if a value is => 1 then change them to 1 otherwise all other values should be zero.

I know that we are able to do it with this method:

df.loc[dt['University'] > 1] = 1

However, above I will have to type each and every one of my 50 columns. Is there a way where I could specifically mention starting from 'University' and ending at 'Score' which should include all the columns in between?

I am new to python so please try to be as clear as possible. Thank you.

Upvotes: 1

Views: 2524

Answers (3)

Nour-Allah Hussein
Nour-Allah Hussein

Reputation: 1459

for any number of columns except the last one score

for i in df.columns[:-1]:
    df[i]=df[i].apply(lambda x: 1 if int(x)>=1 else 0)

Upvotes: 0

LazyEval
LazyEval

Reputation: 859

This could work:

df_new = pd.DataFrame(np.where(df.values >= 1, 1, 0), columns=df.columns)

EDIT: This is not very clean but it might do the job:

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(lambda x: (x >= 1).astype('int'))

EDIT2: For more conditions I would do something like this, but again I don't think this is optimal:

def conditions(row):
    mask1 = (row.values >= 1) & (row.values < 5)
    mask2 = (row.values >= 5) & (row.values < 10)
    mask3 = (row.values >= 10)
    
    row = mask1.astype('int') + mask2.astype('int')*2 + mask3.astype('int')*3
    return row

df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(conditions)

Apply is quite slow in general, and though it might not be an issue for your case, I would probably opt to select all columns of the df and drop the ones you don't want to include in the operation and use .loc (instead of trying to get the range of columns).

Upvotes: 1

vladsiv
vladsiv

Reputation: 2946

I think this could help you:

df = pd.DataFrame(
    {
        "University": ["4", "0", "3"],
        "Subject": ["1", "2", "0"],
        "Test1": ["1", "2", "0"],
        # Other columns
        "Score": ["3", "0", "4"],
    }
)

# Change the type, since you need int
df = df.astype(int)

for column in df.columns[:-1]:
    df.loc[df[column] > 1, column] = 1
print(df.head())

Which gives:

   University  Subject  Test1  Score
0           1        1      1      3
1           0        1      1      0
2           1        0      0      4

I believe that's what you want if I understood you correctly.

Upvotes: 1

Related Questions