Ana Franco
Ana Franco

Reputation: 1801

Replacing the nth word with regex

I have a csv file where I need to change the 7th and the 8th "cell" with a value from a list. I'm trying to do a replace with a regex but I'm having trouble defining the regex.

I tried (\"\w+\").*?(\"\w+\"){7} but without the number the regex appends the 2 cells.

(?:[^\"]\w+\") just leaves the first " behind.

And (\"\w+\")?(\"\w+\") matches every cell.

What I need is given the following line:

"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"

Get the 7th word that would be "bye"

Please help, thanks

Upvotes: 2

Views: 2051

Answers (5)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Your regexps do not work because:

  • (\"\w+\").*?(\"\w+\"){7} - matches ", 1 or more alphanumerics, ", and then 0 or more (but as few as possible) any characters other than a newline, then 7 occurrences of a quoted alphanumeric string. It is clear you will get past the 7th value with this regex (since you already required to match 8).
  • (?:[^\"]\w+\") - Matches 1 character that is not " (with [^"]), then 1 or more alphanumerics (with \w+) and then a ". It is clear this one will only match 1 field with no spaces.
  • (\"\w+\")?(\"\w+\") - this matches 1 or 0 quoted alphanumeric subtstring, and then 1 quoted alphanumeric susbtring. Again, this will not get you to the 7th field.

Note that you do not need /i modifier since you are not using any letters in the pattern. The case-insensitive modifier is necessary when you need to make your regex pattern case-insensitive (i.e. match E and e with e or E).

You can use

/^("[^"]*"(?:;"[^"]*"){5};)"[^"]*"/

Or, contracting it a bit further (since ; is always expected on the right):

/^((?:"[^"]*";){6})"[^"]*"/

The regex matches

  • ^ - start of string
  • ((?:"[^"]*";){6}) - (capture group 1) - exactly 6 occurrences (due to the limiting quantifier {6}) of a "..."-like quoted substring followed by a ;
  • "[^"]*" - a quoted substring that does not have a " inside it (the 7th field value).

Here is a working JS snippet:

$("#button").click(function() {
        var line = '"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"';

    var regex = /^((?:"[^"]*";){6})"[^"]*"/;

    var result = line.replace(regex, '$1"hello"');                                             
    $("#result").val(result);
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.4/jquery.min.js"></script>
<div id="palabra">"b34";"es";"ee";"beer";"beers";"34421";"bye";"bi";"buuu";"fffs"</div><br/>

<div>
<input type="text" id="result" value="" style="width:400px;"/>
</div><br/>

<div id="palabra">Expected: "b34";"es";"ee";"beer";"beers";"34421";"hello";"bi";"buuu";"fffs"</div><br/>
<button id="button" >Run regex</button>

Upvotes: 1

lintmouse
lintmouse

Reputation: 5119

Okay, I made a change to your regex and replace function.

var regex = /^((?:"[^"]*";){6})"[^"]*"/i;

var result = line.replace(regex, '$1' + replacement);      

The $1 matches the first capture group which is everything in the first six cells.

Try it out here:

https://jsfiddle.net/L19ru04s/

This is the pattern:

^((?:"[^"]*";){6})"[^"]*"

https://regex101.com/r/lR5xU5/1

It looks for six occurrences of "TEXT";, and puts them in a capture group, then captures the next cell after that outside of the capture group. That way, when we are replacing the first seven cells, we can use the capture group for the first six cells to keep them around.

Upvotes: 1

Craig Estey
Craig Estey

Reputation: 33601

Take advantage of your data to simplify the regex:

/([^;]+;){6}"([^"]+)"/

Now $2 has bye

Upvotes: 1

Jeff Y
Jeff Y

Reputation: 2456

It looks like you want to capture the "..." that occurs with 6 non-captured instances of "..."; before it:

(?:"[^"]*";){6}("[^"]*")

To capture the 8th alone, change the 6 to 7, or to capture both 7th and 8th:

(?:"[^"]*";){6}("[^"]*");("[^"]*")

https://regex101.com/r/hL0vD9/1

Upvotes: 0

miken32
miken32

Reputation: 42724

This should do the trick:

^((?:".*?";){6})"(.*?)";"(.*)$

First capture is the first six fields, next is field seven, and the third capture is everything else afterwards.

https://regex101.com/r/hV2oS7/2

Upvotes: 1

Related Questions