Reputation: 109
I have a pandas data frame that looks like this (index, Place):
0 California*
1 LA
2 San Diego
3 Oakland
4 Washington*
5 Seattle
6 Tacoma
The values in Place that are states are marked with an asterisk, so I'd like to create a new column and fill it in with the state value. I want to assign the value for state based on any row that has an asterisk and then fill it for all the subsequent rows until a new row comes with a new value. In SAS, this would be done with a retain statement, but I don't know how it works with pandas.
The output should be (index, Place, State):
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
etc.
What's a simple and elegant way of going about this?
Upvotes: 3
Views: 374
Reputation: 294278
creative answer
df.iloc[:, 0].str.split('*').str[::-1] \
.apply(pd.Series).replace('', np.nan).bfill(1).ffill()
serious answer
uses extract
like @MaxU. Difference is that I'm using insert. Nothing notably better, just different.
df.insert(df.shape[1], 'State', df.iloc[:, 0].str.extract('(.*)\*'))
df = df.ffill()
df
Upvotes: 0
Reputation: 210842
You can use Series.str.extract() method in conjunction with Series.ffill():
In [236]: df['State'] = df.Place.str.extract(r'([^\*]+)\*', expand=False).ffill()
In [237]: df
Out[237]:
Place State
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
Upvotes: 1
Reputation: 862671
You can use Series.where
with mask created with indexing with str and ffill
:
print (df.col.str[-1] == '*')
0 True
1 False
2 False
3 False
4 True
5 False
6 False
Name: col, dtype: bool
df['new'] = df.col.str[:-1].where(df.col.str[-1] == '*').ffill()
print (df)
col new
0 California* California
1 LA California
2 San Diego California
3 Oakland California
4 Washington* Washington
5 Seattle Washington
6 Tacoma Washington
Upvotes: 2