How to replace partial strings within a df column

Question

I have the following table. Some entries within the StockCode column have letters at the end, how do I replace these with a number?

I could make a dictionary to map each letter to a number but I feel there a quicker way to do so.

InvoiceNo  |StockCode |
0   536     85123   
1   536     71053Z  
2   536     84406B  
3   536     22623S

Ben · Accepted Answer

This should do the trick.

# Make some data
df = pd.DataFrame({
    'InvoiceNo':536,
    'StockCode':['85123', '71053Z', '84406B', '22623S']
})

# Define replacements in a dictionary. (Note the values are strings, not ints)
replacements = {'Z':'1', 'B':'2', 'S':'3'}

#-- Edit per OP's comment ---------------
import string
keys = list(string.ascii_uppercase)
values = [str(i) for i in range(len(keys))]
replacements = dict(zip(keys, values))
#---------------------------------------

# Rebuild Stockcode
# 1) df.StockCode.str.extract('^(\d+)') extracts every sequence of numbers in StockCode, up until the first non-number character
# 2) df.StockCode.str.extract('([A-Z])$') extracts the non-number character at the end of each string
# 3) .replace(replacements).fillna('') makes the replacements and then changes NaN to ''
# 4) adding two series of strings concatenates them
df['StockCode'] = (df.StockCode.str.extract('^(\d+)') + 
                   df.StockCode.str.extract('([A-Z])$').replace(replacements).fillna(''))

print(df)
   InvoiceNo StockCode
0        536     85123
1        536    710531
2        536    844062
3        536    226233

The key to this is about understanding regular expressions.

How to replace partial strings within a df column

Answers (2)

Related Questions