svn
svn

Reputation: 35

how to add column on panda based on another column partial string

I´m quite new with python and pandas. I´m trying to add a new column to a data frame (group column) with values based on a partial string in another column (user column). Users are coded like this: AA1, AA2, BB1, BB2 and so on. What I want is the group column to have a 'AA' value for all the AA users. After looking for a way to do this, I came up with the following line:

df['group'] = ['AA' if x x.startswith('AA') else 'other' for x in df['user']]

Well,it does´t work: 1) I get invalid syntax and line too long error 2) However, it does work if I change x.startswith('AA') for x == 'AA1', so is it something with the startswith part? 3) I don´t know how to add the 'BB' if x x.starts with('BB') in the same line, or should I write a line for each category of user? Thank you so much

Upvotes: 2

Views: 3198

Answers (2)

jezrael
jezrael

Reputation: 862591

I think you can use numpy.where with str.startswith or str.contains:

import pandas as pd
import numpy as np

df = pd.DataFrame({'user':['AA1','AA2','BB1','BB2']})
print (df)
  user
0  AA1
1  AA2
2  BB1
3  BB2

df['group'] = np.where(df.user.str.startswith('AA'), 'AA', 'other')
df['group1'] = np.where(df.user.str.contains('AA'), 'AA', 'other')
#if need extract first 2 chars from each user
df['g1'] = df.user.str[:2]
print (df)
  user  group group1  g1
0  AA1     AA     AA  AA
1  AA2     AA     AA  AA
2  BB1  other  other  BB
3  BB2  other  other  BB

For extract substring check indexing with str.

Upvotes: 1

MaThMaX
MaThMaX

Reputation: 2015

df['group'] = ['AA' if x.startswith('AA') else 'other' for x in df['user']]

you just have an extra x before x.startswith('AA')

Upvotes: 2

Related Questions