alinpaca
alinpaca

Reputation: 45

Extracting number from string after a specific character

I have a dataframe (~1 million rows) with a column ('Product') that contains strings such as 'none', 'q1', 'q123', or 'q12_a123'.

I would like to extract the number that follows the letter 'q' and enter it into another column ('AmountPaid') so that it looks like the following:

'Product'    'AmountPaid'
 none            0
 q1              1
 q123            123
 q12_a123        12

So far, I have:

for i in range(0,1000000):
   if 'q' not in df.loc[i,'Product']:
      df.loc[i,'AmountPaid']=0
   else:
      # set 'AmountPaid' to the number following 'q'

Questions:

  1. How to extract the number(s) immediately following the letter 'q', but not necessarily everything after it? For example, extract 12 from 'q12_a123'.
  2. Most of the 'AmountPaid' entries will be set to 0. Is there a more efficient way than the for loop and if/else statement above?

Upvotes: 4

Views: 1757

Answers (2)

mozway
mozway

Reputation: 260380

In complement to cs95's answer, since str.extract only captures what is inside capturing groups, a lookbehind is not needed.

It's possible to use directly q(\d+):

df['AmountPaid'] = (df['Product'].str.extract(r'q(\d+)', expand=False)
                                 .fillna(0).astype(int)
                    )

Output:

    Product AmountPaid
0      none          0
1        q1          1
2      q123        123
3  q12_a123         12

This is not just shorter in terms of syntax but also more efficient:

Upvotes: 0

cs95
cs95

Reputation: 402303

You're looking for str.extract with a lookbehind on the character 'q'.

df['AmountPaid'] = df.Product.str.extract(
      r'(?<=q)(\d+)', expand=False
).fillna(0).astype(int)

df

    Product  AmountPaid
0      none           0
1        q1           1
2      q123         123
3  q12_a123          12

Upvotes: 5

Related Questions