Reputation: 45
I have a dataframe (~1 million rows) with a column ('Product') that contains strings such as 'none', 'q1', 'q123', or 'q12_a123'.
I would like to extract the number that follows the letter 'q' and enter it into another column ('AmountPaid') so that it looks like the following:
'Product' 'AmountPaid'
none 0
q1 1
q123 123
q12_a123 12
So far, I have:
for i in range(0,1000000):
if 'q' not in df.loc[i,'Product']:
df.loc[i,'AmountPaid']=0
else:
# set 'AmountPaid' to the number following 'q'
Questions:
Upvotes: 4
Views: 1757
Reputation: 260380
In complement to cs95's answer, since str.extract
only captures what is inside capturing groups, a lookbehind is not needed.
It's possible to use directly q(\d+)
:
df['AmountPaid'] = (df['Product'].str.extract(r'q(\d+)', expand=False)
.fillna(0).astype(int)
)
Output:
Product AmountPaid
0 none 0
1 q1 1
2 q123 123
3 q12_a123 12
This is not just shorter in terms of syntax but also more efficient:
Upvotes: 0
Reputation: 402303
You're looking for str.extract
with a lookbehind on the character 'q'
.
df['AmountPaid'] = df.Product.str.extract(
r'(?<=q)(\d+)', expand=False
).fillna(0).astype(int)
df
Product AmountPaid
0 none 0
1 q1 1
2 q123 123
3 q12_a123 12
Upvotes: 5