Reputation: 1801

Replacing the nth word with regex

I have a csv file where I need to change the 7th and the 8th "cell" with a value from a list. I'm trying to do a replace with a regex but I'm having trouble defining the regex.

I tried (\"\w+\").*?(\"\w+\"){7} but without the number the regex appends the 2 cells.

(?:[^\"]\w+\") just leaves the first " behind.

And (\"\w+\")?(\"\w+\") matches every cell.

What I need is given the following line:

"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"

Get the 7th word that would be "bye"

Please help, thanks

Upvotes: 2

Answers (5)

Wiktor Stribiżew

Reputation: 626845

Your regexps do not work because:

(\"\w+\").*?(\"\w+\"){7} - matches ", 1 or more alphanumerics, ", and then 0 or more (but as few as possible) any characters other than a newline, then 7 occurrences of a quoted alphanumeric string. It is clear you will get past the 7th value with this regex (since you already required to match 8).
(?:[^\"]\w+\") - Matches 1 character that is not " (with [^"]), then 1 or more alphanumerics (with \w+) and then a ". It is clear this one will only match 1 field with no spaces.
(\"\w+\")?(\"\w+\") - this matches 1 or 0 quoted alphanumeric subtstring, and then 1 quoted alphanumeric susbtring. Again, this will not get you to the 7th field.

Note that you do not need /i modifier since you are not using any letters in the pattern. The case-insensitive modifier is necessary when you need to make your regex pattern case-insensitive (i.e. match E and e with e or E).

You can use

/^("[^"]*"(?:;"[^"]*"){5};)"[^"]*"/

Or, contracting it a bit further (since ; is always expected on the right):

/^((?:"[^"]*";){6})"[^"]*"/

The regex matches

^ - start of string
((?:"[^"]*";){6}) - (capture group 1) - exactly 6 occurrences (due to the limiting quantifier {6}) of a "..."-like quoted substring followed by a ;
"[^"]*" - a quoted substring that does not have a " inside it (the 7th field value).

Here is a working JS snippet:

$("#button").click(function() {
        var line = '"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"';

    var regex = /^((?:"[^"]*";){6})"[^"]*"/;

    var result = line.replace(regex, '$1"hello"');                                             
    $("#result").val(result);
});

<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script>
<div id="palabra">"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"</div><br/>

<div>
<input type="text" id="result" value="" style="width:400px;"/>
</div><br/>

<div id="palabra">Expected: "b34";"es";"ee";"beer";"beers";"34421";"hello";"bi";"buuu";"fffs"</div><br/>
<button id="button" >Run regex</button>

Upvotes: 1

lintmouse

Reputation: 5119

Okay, I made a change to your regex and replace function.

var regex = /^((?:"[^"]*";){6})"[^"]*"/i;

var result = line.replace(regex, '$1' + replacement);

The $1 matches the first capture group which is everything in the first six cells.

Try it out here:

https://jsfiddle.net/L19ru04s/

This is the pattern:

^((?:"[^"]*";){6})"[^"]*"

https://regex101.com/r/lR5xU5/1

It looks for six occurrences of "TEXT";, and puts them in a capture group, then captures the next cell after that outside of the capture group. That way, when we are replacing the first seven cells, we can use the capture group for the first six cells to keep them around.

Upvotes: 1