Guga
Guga

Reputation: 349

Regex in pandas: Match vs Findall

I am confused about when to use both str.findall and str.match.

For example, I have a df that has many lines of text where I need to extract dates.

Let us say I want to extract check the lines where there is a work Mar (as of the abbreviation of March).

I if I broadcast the df where there is a match

df[df.original.str.match(r'(Mar)')==True]

I got the following output:

204 Mar 10 1976 CPT Code: 90791: No medical servic...
299 March 1974 Primary ...

However, if I try the same regex within the str.findall, I got nothing:

0      []
1      []
2      []
3      []
4      []
5      []
6      []
7      []
...

495              []
496              []
497              []
498              []
499              []
Name: original, Length: 500, dtype: object

Why is that ? I am sure it is a lack of understanding on match, find, findall, extract and extractall.

Upvotes: 0

Views: 1888

Answers (1)

ileadall42
ileadall42

Reputation: 651

I try to use the documentation to explain this:

s = pd.Series(["a1a2", "b1", "c1"], index=["A", "B", "C"])
s

output:

A    a1a2
B      b1
C      c1
dtype: object

We first make the Series like this,and then use the extract,extractall,find,findall

s.str.extract("([ab])(\d)",expand=True)#We could use the extract and give the pat which can be str of regx 
and  only return the first match of the results.

    0   1
A   a   1
B   b   1
C   NaN NaN

s.str.extractall("([ab])(\d)")#return all the detail which me match 

       0    1
match       
A   0   a   1
1   a   2
B   0   b   1

s.str.find("([ab])(\d)")#all the values is -1 cause find can only give the string

s.str.find('a')
A    0
B   -1
C   -1
dtype: int64

s.str.findall("([ab])(\d)")#give a string or regx and return the detail result
A    [(a, 1), (a, 2)]
B            [(b, 1)]
C                  []
dtype: object

Upvotes: 1

Related Questions