Reputation: 387
can someone help me create two new columns in this dataframe?
The desire is to parse the state out, "s" and then ensure that the state is removed from the original title string. The result would be to include the original title, the cleaned title (without the trailing State) and finally the state name.
df=pd.Series(['Accommodation Payroll Employment in Texas',
'Accounting, Tax Preparation, Bookkeeping, and Payroll Services Payroll Employment in Texas']).to_frame()
df.columns=['title']
def state_code(row):
t=None
s=None
if len(row['title'].split(' in '))==2:
s=str(row['title'].split(' in ')[1])
t=str(row['title'].split(' in ')[0])
elif len(row['title'].split(' in '))==3:
s=str(row['title'].split(' in ')[2])
t=str(row['title'].split(' in ')[0]+row['title'].split(' in ')[1])
elif len(row['title'].split(' for '))==2:
s=str(row['title'].split(' for ')[1])
t=str(row['title'].split(' for ')[0])
return t,s
df[['title_clean','state']]=df.apply(state_code,axis=1)
Upvotes: 1
Views: 70
Reputation: 76297
Instead of
return t, s
try
return pd.Series(dict(state=s, title_clean=t))
and instead of
df[['title_clean','state']]=df.apply(state_code,axis=1)
use
pd.concat([df, df.apply(state_code,axis=1)], axis=1)
Incidentally, your
t = None
s = None
seems redundant.
Upvotes: 2