Reputation: 35
I´m quite new with python and pandas. I´m trying to add a new column to a data frame (group column) with values based on a partial string in another column (user column). Users are coded like this: AA1, AA2, BB1, BB2 and so on. What I want is the group column to have a 'AA' value for all the AA users. After looking for a way to do this, I came up with the following line:
df['group'] = ['AA' if x x.startswith('AA') else 'other' for x in df['user']]
Well,it does´t work: 1) I get invalid syntax and line too long error 2) However, it does work if I change x.startswith('AA') for x == 'AA1', so is it something with the startswith part? 3) I don´t know how to add the 'BB' if x x.starts with('BB') in the same line, or should I write a line for each category of user? Thank you so much
Upvotes: 2
Views: 3198
Reputation: 862591
I think you can use numpy.where
with str.startswith
or str.contains
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'user':['AA1','AA2','BB1','BB2']})
print (df)
user
0 AA1
1 AA2
2 BB1
3 BB2
df['group'] = np.where(df.user.str.startswith('AA'), 'AA', 'other')
df['group1'] = np.where(df.user.str.contains('AA'), 'AA', 'other')
#if need extract first 2 chars from each user
df['g1'] = df.user.str[:2]
print (df)
user group group1 g1
0 AA1 AA AA AA
1 AA2 AA AA AA
2 BB1 other other BB
3 BB2 other other BB
For extract substring check indexing with str.
Upvotes: 1
Reputation: 2015
df['group'] = ['AA' if x.startswith('AA') else 'other' for x in df['user']]
you just have an extra x
before x.startswith('AA')
Upvotes: 2