Burfi
Burfi

Reputation: 229

Split String Using Regular Expression

I have a below set of string :

*H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA

The expected output is :

1)  *H. NGUYEN1, J. SATZ2,3,4,5, R. TURK2,3,4,5, K. CAMPBELL2,3,4,5, S. MOORE1
2)  1Pathology, 2Mol. Physiol. and Biophysics, 3Neurol., 4Intrnl. Med., Univ. of Iowa, Iowa City, IA; 5Howard Hughes Med. Inst., Iowa City, IA

The above string is the author names and address combinations.
Sometimes the string contains (;) after the names end i.e. S. MOORE1; and sometimes not i.e. S. MOORE1

I tried the below Regex but its not giving expected results . Please help me as I am a learner of Regex.

;?[\d*]\w+

Pattern is :

Word followed by digit followed by semicolon or space followed by digit followed by words . For Ex: S. MOORE1(; Or Space)1Pathology.Need to split lines as S .MOORE1 and 1Pathology

Thanks

Upvotes: 2

Views: 194

Answers (3)

Laurence
Laurence

Reputation: 1713

Try this one:

(?<=\w\d)[; ](?=\d\w)

It will match ; or space preceded by a letter then a digit, then followed by a digit and a letter.

Edit: taking into account , and ;space and possible new line characters

(?<=[\w,]\d)[; ]+[\r\n\f]*(?=\d\w)

Also you can use Expresso for testing regular expressions

Upvotes: 1

Michal Klouda
Michal Klouda

Reputation: 14521

I have read your description many times, but I don't find it clear.

My best guess what you need is breaking the line before a word starting with '1' and continuing with capital letter as second character, which is as simple as:

1[A-Z]

Upvotes: 0

Martin
Martin

Reputation: 334

Try this one:

(.*)S. MOORE1;{0,1}(.*)

Catches 2 Groups before and after "S. MOORE1"

Upvotes: 0

Related Questions