Reputation: 139
Below is my DataFrame:
df = pd.DataFrame({'University':['4', '0', '3'],'Subject':['1', '2', '0']})
Just like 'University' & 'Subject' I have 50 other columns that end with the
column 'Score':['3', '1', '0'].
I would like to change the values for all 50 columns starting from the University all the way to Score. The condition should be if a value is => 1 then change them to 1 otherwise all other values should be zero.
I know that we are able to do it with this method:
df.loc[dt['University'] > 1] = 1
However, above I will have to type each and every one of my 50 columns. Is there a way where I could specifically mention starting from 'University' and ending at 'Score' which should include all the columns in between?
I am new to python so please try to be as clear as possible. Thank you.
Upvotes: 1
Views: 2524
Reputation: 1459
for any number of columns except the last one score
for i in df.columns[:-1]:
df[i]=df[i].apply(lambda x: 1 if int(x)>=1 else 0)
Upvotes: 0
Reputation: 859
This could work:
df_new = pd.DataFrame(np.where(df.values >= 1, 1, 0), columns=df.columns)
EDIT: This is not very clean but it might do the job:
df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(lambda x: (x >= 1).astype('int'))
EDIT2: For more conditions I would do something like this, but again I don't think this is optimal:
def conditions(row):
mask1 = (row.values >= 1) & (row.values < 5)
mask2 = (row.values >= 5) & (row.values < 10)
mask3 = (row.values >= 10)
row = mask1.astype('int') + mask2.astype('int')*2 + mask3.astype('int')*3
return row
df.loc[:, 'University':'Score'] = df.loc[:, 'University':'Score'].apply(conditions)
Apply is quite slow in general, and though it might not be an issue for your case, I would probably opt to select all columns of the df and drop the ones you don't want to include in the operation and use .loc
(instead of trying to get the range of columns).
Upvotes: 1
Reputation: 2946
I think this could help you:
df = pd.DataFrame(
{
"University": ["4", "0", "3"],
"Subject": ["1", "2", "0"],
"Test1": ["1", "2", "0"],
# Other columns
"Score": ["3", "0", "4"],
}
)
# Change the type, since you need int
df = df.astype(int)
for column in df.columns[:-1]:
df.loc[df[column] > 1, column] = 1
print(df.head())
Which gives:
University Subject Test1 Score
0 1 1 1 3
1 0 1 1 0
2 1 0 0 4
I believe that's what you want if I understood you correctly.
Upvotes: 1