Reputation: 285
What I have is:
I have made a simple Powershell script to replace the contents of the text files and rewrite the files (UTF8 encoding is crucial):
((Get-Content -path *.adoc -Raw -Encoding utf8) -replace '\[.dfn .term]#.*#','[.dfn .term]_.*_') | Set-Content -Path *.adoc -Encoding utf8
When I tried to run the script like this, I found out that I'm replacing a regex string with a plain text string.
What I want to achieve is:
Find a line that begins with [.dfn .term]
, has any number of characters between #
and #
, and replace #
with _
. Leaving [.dfn .term]
and # everything between #
unchanged.
I can't replace all #
with _
because there can also be text like [.keyword]#something#
and it will need replacing #
with *
. Also, something
can be anything - a word or a phrase.
Dealing with patterns and RegEx groups is outside my knowledge. I would appreciate any help.
Example:
I have: A sentence is a string of [.dfn .term]#Words#
that has a finished [.keyword]#Thought#
. Sentences form [.dfn .term]#Paragraphs#
. [.dfn .term]#Paragraphs#
form text. Text is cool.
I want to have: A sentence is a string of [.dfn .term]_Words_
that has a finished [.keyword]*Thought*
. Sentences form [.dfn .term]_Paragraphs_
. [.dfn .term]_Paragraphs_
form text. Text is cool.
Upvotes: 0
Views: 364
Reputation: 17007
use these regex with groups to help you:
$lines = Get-Content -Path C:\file.txt -Encoding UTF8 -Raw
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
$pattern1 = [regex]::new("(\[\.dfn \.term])#(.*?)#", $option)
#be careful simple quote is important here
$lines = $pattern1.Replace($lines, '$1_$2_')
$pattern2 = [regex]::new("(\[what you want])#(.*?)#", $option)
$lines = $pattern2.Replace($lines, '$1*$2*')
$lines | Set-Content -Path C:\result.txt -Encoding UTF8
test file:
[.dfn .term]#azaeaeae#
[.dfn .term]#errrr# sqsqsqs
[.dfn .term]#errrr# sqsqsqs
eaeaeaeae
aeaeae
[.dfn .term]#errrr# [.keyword]#something# #errrr#
result: (with second pattern .keyword)
[.dfn .term]_azaeaeae_
[.dfn .term]_errrr_ sqsqsqs
[.dfn .term]_errrr_ sqsqsqs
eaeaeaeae
aeaeae
[.dfn .term]_errrr_ [.keyword]*something* #errrr#
you could write too:
$lines = (Get-Content -path C:\yourfile.txt -Raw -Encoding utf8) `
-replace '(\[\.dfn \.term])#(.*?)#', '$1_$2_' `
-replace '(\[\.keyword])#(.*?)#', '$1*$2*'
you could use named groups if you want:
$pattern1 = [regex]::new("(?<begin>\[\.dfn \.term])#(?<text>.*?)#", $option)
#be careful simple quote is important here
$lines = $pattern1.Replace($lines, '${begin}_${text}_')
if you have lot of patterns different, you could put them in an object:
$patterns = @{
'(\[\.dfn \.term])#(.*?)#' = '$1_$2_' ;
'(\[\.keyword])#(.*?)#' = '$1*$2*'
}
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
foreach($k in $patterns.Keys){
$pat = [regex]::new($k, $option)
$lines = $pat.Replace($lines, $patterns.$k)
}
Upvotes: 1
Reputation: 366
You want to create a regexp that matches JUST the # symbols following the [.dfn .term] and at the end of the line.
Here's an example:
"[.dfn .term]# everything between #" -replace "(?<=\[\.dfn \.term\])#|#$", "_"
...which results in: [.dfn .term]_ everything between _
Here's how it breaks down:
(?<=[.dfn .term]) - looks for [.dfn .term], but does not match the text. It's called a positive look behind.
# - matches the pound sign
| - OR
#$ - matches the pound sign at the end of the line
Upvotes: 0