Ascension
Ascension

Reputation: 19

Regex - Remove everything before first comma and everything after second comma in line

I have the following string:

55,1001wuensche.com,0,354137264,1,"0.00 %",0,"0.00 %","2016-04-24 09:00:24"
56,100hoch3.de,47,2757361,2,"0.00 %",0,"0.00 %","2016-02-11 00:42:10"

I want to remove everything before the first comma: 55, and 56,

AND everything after the second comma.

The result should look like this, where only the domain name is left:

1001wuensche.com
100hoch3.de

I'm using Notepad++ to accomplish this. Anybody got an idea? Thanks for your help in advance!

Upvotes: 1

Views: 7679

Answers (3)

Mike Robinson
Mike Robinson

Reputation: 8945

Another way to do this sort of thing (in a more general sense) is to "split the line by commas, into an array, then take only the second element of that array.

Yet-another way to do it is to execute two "substitute" regexes, both explicitly anchored to the beginning or to the end of the line (and the first being non-"greedy" e.g.:

s/^.*\?,//

s/\,.*$//

The concept of "greediness" is quite important, because in the first case we want to match the least number of characters, so as to stop at the first comma that is encountered. (Hence, "non-greedy.") Whereas, in the second case, you do want to "greedily" identify (and set to empty-string) the biggest match that you can find: namely, "the rest of the string."

Find the simplest and most obvious way to do it, because, quite inevitably, someone's going to want to change this logic someday. Or, someone will hand you a file that breaks your "clever, elegant" approach. Think "testable, and maintainable."

Upvotes: 1

Sebastian Proske
Sebastian Proske

Reputation: 8413

You could search for ^[^,]+,([^,]+).* and replace it with $1.

If there is a chance of non-well formatted lines (containing empty strings before the first comma or lines without comma) you could use a more strict pattern like ^[^,\r\n]*,([^,\r\n]+).+ instead.

Upvotes: 2

Will Barnwell
Will Barnwell

Reputation: 4089

^.*?,(.*?),.*$

The capture group $1 will be everything between the first two commas.

Upvotes: 4

Related Questions