Cuisilopez
Cuisilopez

Reputation: 39

Extract substring between two characters - python DataFrame

What is the meaning of string locator ', \s*([^\.]*)\s*\.' =?

I have a dataframe identical to Extract sub-string between 2 special characters from one column of Pandas DataFrame

and want to extract the substring located between "," and ".". Thanks to the post answer, a way would be as below:

In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)

In [158]: df
Out[158]:
                   Name   Title
0        Jim, Mr. Jones      Mr
1     Sara, Miss. Baker    Miss
2     Leila, Mrs. Jacob     Mrs
3  Ramu, Master. Kuttan  Master

Although I see the outcome being correct, what is the meaning of ',\s*([^\.]*)\s*\.'? In particular, what is the meaning of '*' and '\'?

Upvotes: 0

Views: 4431

Answers (1)

Dani Mesejo
Dani Mesejo

Reputation: 61930

It means the following, match:

  • a , (comma)
  • followed by \s* zero or more whitespaces characters (tab, spaces, etc)
  • followed by ([^\.])* zero or more characters that are not a . (dot)
  • followed by \s* zero or more whitespaces characters
  • followed by a \. (dot)

You can find more about regex in here.

UPDATE

As @UnbearableLightness mentioned the character \ is redundant inside a character set to escape the . (dot). A character set is anything defined between [].

Upvotes: 2

Related Questions