Paula Pipkin
Paula Pipkin

Reputation: 51

I would like to create a new column based on conditions using .loc

I have the code:

to_test['averageRating'].unique() array([5.8, 5.2, 5. , 6.5, 5.5, 7.3, 7.2, 4.2, 6.4, 7.1, 6.6, 5.4, 6.9, 6. , 6.1, 8.1, 6.3, 7.8, 3.9, 6.8, 6.2, 7.9, 7. , 4.9, 5.9, 7.5, 6.7, 8. , 5.7, 3.2, 4.8, 5.6, 7.4, 4.5, 3.6, 4.3, 3.4, 5.1, 4.4, 4.7, 7.7, 5.3, 4. , 8.4, 7.6, 3.3, 2.2, 3.7, 8.2, 4.1, 8.3, 1.7, 9. , 4.6, 8.5, 3.1, 3.8, 3.5, 1.9, 2.9, 2.8, 2.7, 9.2, 1.2, 2.1, 3. , 1.3, 1.1, 8.6, 2.5, 1. , 9.8, 8.7, 1.5, 9.3])

`

create a list of our conditions

conditions = [(to_test.loc[(to_test['averageRating']>=0.0) & (to_test['averageRating'] <= 3.3)]), (to_test.loc[(to_test['averageRating']>=3.4) & (to_test['averageRating'] <=6.6)]), (to_test.loc[(to_test['averageRating']>=6.7) & (to_test['averageRating'] <=10)])]

create a list of the values we want to assign for each condition

values = ['group1', 'group2', 'group3']

create a new column and use np.select to assign values to it using our lists as arguments

to_test['group'] = np.select(conditions, values)

display updated DataFrame

to_test.head()`

but it's not working

Upvotes: 0

Views: 65

Answers (1)

sayan dasgupta
sayan dasgupta

Reputation: 1082

This is using a classic case of using cut. Sample code

df = pd.DataFrame({'averageRating' : np.random.uniform(0,10,100)})
df['group_using_cut'] = pd.cut(df['averageRating'],
                               [0,3.3,6.6,10],
                               labels=['group1','group2','group3'])

If you want to use np.select use conditions without loc Sample Code

conds = [
    (df['averageRating']>=0.0) & (df['averageRating'] <= 3.3),
    (df['averageRating']>=3.4) & (df['averageRating'] <= 6.6),
    (df['averageRating']>=6.7) & (df['averageRating'] <= 10),
         ]
df['group_using_selec'] = np.select(conds,['group1','group2','group3'])

Output df.head()

enter image description here

Upvotes: 0

Related Questions