Reputation: 39
What is the meaning of string locator ', \s*([^\.]*)\s*\.'
=?
I have a dataframe identical to Extract sub-string between 2 special characters from one column of Pandas DataFrame
and want to extract the substring located between ","
and "."
. Thanks to the post answer, a way would be as below:
In [157]: df['Title'] = df.Name.str.extract(r',\s*([^\.]*)\s*\.', expand=False)
In [158]: df
Out[158]:
Name Title
0 Jim, Mr. Jones Mr
1 Sara, Miss. Baker Miss
2 Leila, Mrs. Jacob Mrs
3 Ramu, Master. Kuttan Master
Although I see the outcome being correct, what is the meaning of ',\s*([^\.]*)\s*\.'
? In particular, what is the meaning of '*' and '\'?
Upvotes: 0
Views: 4431
Reputation: 61930
It means the following, match:
,
(comma)\s*
zero or more whitespaces characters (tab, spaces, etc)([^\.])*
zero or more characters that are not a .
(dot) \s*
zero or more whitespaces characters\.
(dot)You can find more about regex in here.
UPDATE
As @UnbearableLightness mentioned the character \
is redundant inside a character set to escape the .
(dot). A character set is anything defined between []
.
Upvotes: 2