Reputation: 153
I really have tried to solve this myself but have been bashing my head against a brick wall with this one.
I have a file with many rows like this:-
<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
I want a regexp to return just the string between the name=" and the next "
In this case, it's 'Net Salary per month € (3rd Applicant)' but it could be anything. That's what I meant by extracting a variable substring.
Thanks in advance.
Upvotes: 2
Views: 7183
Reputation: 437082
There are helpful regexes in the existing answers; using one with the -replace
operator allows you to extract the information of interest in a single operation:
$line = '<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>'
# Extract the "name" attribute value.
# Note how the regex is designed to match the *full line*, which is then
# replaced with what the first (and only) capture group, (...), matched, $1
$line -replace '^.+ name="([^"]*).+', '$1'
This outputs a string with verbatim content Net Salary per month € (3rd Applicant)
.
Taking a step back: Your sample line is a valid XML element, and it's always preferable to use a dedicated XML parser.
Parsing each line as XML will be slow, but perhaps you can parse the entire file, which offers a simple solution using PowerShell's property-based adaption of the XML DOM, via the [xml]
type (System.Xml.XmlDocument
):
$fileContent = @'
<xml>
<outputColumn id="426" name="Net Salary per month € (3rd Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
<outputColumn id="427" name="Net Salary per month € (4th Applicant)" description="" lineageId="426" precision="0" scale="0" length="255" dataType="wstr" codePage="0" sortKeyPosition="0" comparisonFlags="0" specialFlags="0" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" truncationRowDisposition="FailComponent" externalMetadataColumnId="425" mappedColumnId="0"/>
</xml>
'@
([xml] $fileContent).xml.outputColumn.name
The above yields the "name"
attribute values across all <outputColumn>
elements:
Net Salary per month € (3rd Applicant)
Net Salary per month € (4th Applicant)
Upvotes: 0
Reputation: 7152
As there are a lot of '"' characters after name you would probably have to use the lazy flag
try
^.*name=\"(.+?)\".*$
matches the whole line and should give you want you want within the group (.+?)
Upvotes: 0
Reputation: 4860
This may help:
Regex = name="(.*?)"
DEMO
https://regex101.com/r/uF4oY4/51
Let me know if it helps.
Upvotes: 2
Reputation: 67968
(?<=name=")[^"]*
This should do it for you.See demo.
https://regex101.com/r/uF4oY4/50
If you dont have lookarounds
then use
name="([^"]*)
and grab the group 1
.
Upvotes: 2