Reputation: 99
I am trying to extract IP addresses AND text from a file, not just IP
(\w\b)(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)(\w\b)(\w\b)
Input data: 23E42B42 93.30.66.103 1535875201 0
Expected:
Group1 23E42B42
Group2 93.30.66.103
Group3 1535875201
Group4 0
Upvotes: 1
Views: 473
Reputation: 163447
In your pattern you have to use a quantifier after matching one or more word characters \w+
. Note that \w
itself does not match spaces, so you would have to add them to the pattern to match them literally.
You can omit the \b before the space because there is no need to specify that as there is a word boundary between \w
and a space.
You might use a somewhat more specific match using \d
for the digits:
^([A-Z0-9]+) (\d{1,3}(?:\.\d{1,3}){3}) (\d+) (\d+)$
Explanation
^
Start of string([A-Z0-9]+)
Match what is listed in the character class 1+ times and space(\d{1,3}(?:\.\d{1,3}){3})
Match an ip like format and space (Does not validate an ip)(\d+)
Capture 1+ digits and space(\d+)
Capture 1+ digits and space$
End of stringUpvotes: 0
Reputation: 27733
In another approach, we might be able to start with our four patterns and use the space in between them as a separator, maybe similar to:
([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)
where our desired outputs are saved in capturing groups $1
to $4
. We can add more boundaries to the expression, such as start and end chars:
^([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)$
If we wish, we could validate IPs and increase the boundaries.
If this expression wasn't desired, it can be modified or changed in regex101.com.
jex.im visualizes regular expressions:
This snippet just shows that how the capturing groups work:
const regex = /^([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)$/gm;
const str = `23E42B42 93.30.66.103 1535875201 0
23E42B42 93.30.66.103 1535875201 012`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 0
Reputation: 1681
You're close. You need to change \w
to \w+
to capture one or more consecutive word characters. Also, try matching spaces \s+
instead of word boundaries \b
.
(\w+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\w+)\s+(\w+)
Upvotes: 0
Reputation: 21271
this would work
(\w+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\w+)\s+(\w)
https://regex101.com/r/HGMeRL/1/
Upvotes: 1