Reputation: 229
I have a below set of string :
*H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA
The expected output is :
1) *H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
2) 1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA
The above string is the author names and address combinations.
Sometimes the string contains (;) after the names end i.e. S. MOORE1; and sometimes not i.e. S. MOORE1
I tried the below Regex but its not giving expected results . Please help me as I am a learner of Regex.
;?[\d*]\w+
Pattern is :
Word followed by digit followed by semicolon or space followed by digit followed by words . For Ex: S. MOORE1(; Or Space)1Pathology.Need to split lines as S .MOORE1 and 1Pathology
Thanks
Upvotes: 2
Views: 194
Reputation: 1713
Try this one:
(?<=\w\d)[; ](?=\d\w)
It will match ; or space preceded by a letter then a digit, then followed by a digit and a letter.
Edit: taking into account , and ;space and possible new line characters
(?<=[\w,]\d)[; ]+[\r\n\f]*(?=\d\w)
Also you can use Expresso for testing regular expressions
Upvotes: 1
Reputation: 14521
I have read your description many times, but I don't find it clear.
My best guess what you need is breaking the line before a word starting with '1' and continuing with capital letter as second character, which is as simple as:
1[A-Z]
Upvotes: 0
Reputation: 334
Try this one:
(.*)S. MOORE1;{0,1}(.*)
Catches 2 Groups before and after "S. MOORE1"
Upvotes: 0