regular expression using pandas string match

Question

Input data:

                        name  Age Zodiac Grade            City  pahun
0                   /extract   30  Aries     A            Aura  a_b_c
1  /abc/236466/touchbar.html   20    Leo    AB      Somerville  c_d_e
2                    Brenda4   25  Virgo     B  Hendersonville    f_g
3     /abc/256476/mouse.html   18  Libra    AA          Gannon  h_i_j

I am trying to extract the rows based on the regex on the name column. This regex extracts the numbers which has 6 as length.

For example:
/abc/236466/touchbar.html  - 236466

Here is the code I have used

df=df[df['name'].str.match(r'\d{6}') == True]

The above line is not matching at all.

Expected:

                         name  Age Zodiac Grade            City  pahun
0  /abc/236466/touchbar.html   20    Leo    AB      Somerville  c_d_e
1     /abc/256476/mouse.html   18  Libra    AA          Gannon  h_i_j

Can anyone tell me where am I doing wrong?

Wiktor Stribiżew · Accepted Answer

str.match only searches for a match at the start of the string. So, if you want to match / + 6 digits + / somewhere inside the string using str.match, you would need to use one of

df=df[df['name'].str.match(r'.*/\d{6}/')]      # assuming the match is closer to the end of the string
df=df[df['name'].str.match(r'(?s).*/\d{6}/')]  # same, but allows a multiline search
df=df[df['name'].str.match(r'.*?/\d{6}/')]     # assuming the match is closer to the start of the string
df=df[df['name'].str.match(r'(?s).*?/\d{6}/')] # same, but allows a multiline search

However, it is more reasonable and efficient here to use str.contains with a regex like

df=df[df['name'].str.contains(r'/\d{6}/')]

to find entries containing / + 6 digits + /.

Or, to make sure you just match 6 digit chunks and not 7+ digit chunks:

df=df[df['name'].str.contains(r'(?


where

(? - makes sure there is no digit on the left

\d{6} - any six digits
(?!\d) - no digit on the right is allowed.

regular expression using pandas string match

Answers (2)

Related Questions