Extracting number from string after a specific character

Question

I have a dataframe (~1 million rows) with a column ('Product') that contains strings such as 'none', 'q1', 'q123', or 'q12_a123'.

I would like to extract the number that follows the letter 'q' and enter it into another column ('AmountPaid') so that it looks like the following:

'Product'    'AmountPaid'
 none            0
 q1              1
 q123            123
 q12_a123        12

So far, I have:

for i in range(0,1000000):
   if 'q' not in df.loc[i,'Product']:
      df.loc[i,'AmountPaid']=0
   else:
      # set 'AmountPaid' to the number following 'q'

Questions:

How to extract the number(s) immediately following the letter 'q', but not necessarily everything after it? For example, extract 12 from 'q12_a123'.
Most of the 'AmountPaid' entries will be set to 0. Is there a more efficient way than the for loop and if/else statement above?

cs95 · Accepted Answer

You're looking for str.extract with a lookbehind on the character 'q'.

df['AmountPaid'] = df.Product.str.extract(
      r'(?<=q)(\d+)', expand=False
).fillna(0).astype(int)

df

    Product  AmountPaid
0      none           0
1        q1           1
2      q123         123
3  q12_a123          12

Extracting number from string after a specific character

Answers (2)

Related Questions