AidKulesh
AidKulesh

Reputation: 109

Python Retaining Column Value

I have a pandas data frame that looks like this (index, Place):

0 California*
1 LA
2 San Diego
3 Oakland
4 Washington*
5 Seattle
6 Tacoma

The values in Place that are states are marked with an asterisk, so I'd like to create a new column and fill it in with the state value. I want to assign the value for state based on any row that has an asterisk and then fill it for all the subsequent rows until a new row comes with a new value. In SAS, this would be done with a retain statement, but I don't know how it works with pandas.

The output should be (index, Place, State):

0 California* California
1 LA          California
2 San Diego   California
3 Oakland     California
4 Washington* Washington
5 Seattle     Washington
6 Tacoma      Washington

etc.

What's a simple and elegant way of going about this?

Upvotes: 3

Views: 374

Answers (3)

piRSquared
piRSquared

Reputation: 294278

creative answer

df.iloc[:, 0].str.split('*').str[::-1] \
  .apply(pd.Series).replace('', np.nan).bfill(1).ffill()

enter image description here


serious answer
uses extract like @MaxU. Difference is that I'm using insert. Nothing notably better, just different.

df.insert(df.shape[1], 'State', df.iloc[:, 0].str.extract('(.*)\*'))
df = df.ffill()

df

enter image description here

Upvotes: 0

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

You can use Series.str.extract() method in conjunction with Series.ffill():

In [236]: df['State'] = df.Place.str.extract(r'([^\*]+)\*', expand=False).ffill()

In [237]: df
Out[237]:
         Place       State
0  California*  California
1           LA  California
2    San Diego  California
3      Oakland  California
4  Washington*  Washington
5      Seattle  Washington
6       Tacoma  Washington

Upvotes: 1

jezrael
jezrael

Reputation: 862671

You can use Series.where with mask created with indexing with str and ffill:

print (df.col.str[-1] == '*')
0     True
1    False
2    False
3    False
4     True
5    False
6    False
Name: col, dtype: bool

df['new'] = df.col.str[:-1].where(df.col.str[-1] == '*').ffill()
print (df)
           col         new
0  California*  California
1           LA  California
2    San Diego  California
3      Oakland  California
4  Washington*  Washington
5      Seattle  Washington
6       Tacoma  Washington

Upvotes: 2

Related Questions