user1898248
user1898248

Reputation: 99

Extract text (words) AND IP address from text

I am trying to extract IP addresses AND text from a file, not just IP

(\w\b)(\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b)(\w\b)(\w\b)

Input data: 23E42B42 93.30.66.103 1535875201 0

Expected:

Group1 23E42B42

Group2 93.30.66.103

Group3 1535875201

Group4 0

Upvotes: 1

Views: 473

Answers (4)

The fourth bird
The fourth bird

Reputation: 163447

In your pattern you have to use a quantifier after matching one or more word characters \w+. Note that \w itself does not match spaces, so you would have to add them to the pattern to match them literally.

You can omit the \b before the space because there is no need to specify that as there is a word boundary between \w and a space.

You might use a somewhat more specific match using \d for the digits:

^([A-Z0-9]+) (\d{1,3}(?:\.\d{1,3}){3}) (\d+) (\d+)$

Regex demo

Explanation

  • ^ Start of string
  • ([A-Z0-9]+) Match what is listed in the character class 1+ times and space
  • (\d{1,3}(?:\.\d{1,3}){3}) Match an ip like format and space (Does not validate an ip)
  • (\d+) Capture 1+ digits and space
  • (\d+) Capture 1+ digits and space
  • $ End of string

Regex demo

Upvotes: 0

Emma
Emma

Reputation: 27733

In another approach, we might be able to start with our four patterns and use the space in between them as a separator, maybe similar to:

([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)

where our desired outputs are saved in capturing groups $1 to $4. We can add more boundaries to the expression, such as start and end chars:

^([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)$

If we wish, we could validate IPs and increase the boundaries.

enter image description here

RegEx

If this expression wasn't desired, it can be modified or changed in regex101.com.

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Demo

This snippet just shows that how the capturing groups work:

const regex = /^([A-Z0-9]+)\s+([0-9.]+)\s+([0-9]+)\s+([0-9]+)$/gm;
const str = `23E42B42 93.30.66.103 1535875201 0
23E42B42     93.30.66.103     1535875201   012`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Upvotes: 0

Irfan434
Irfan434

Reputation: 1681

You're close. You need to change \w to \w+ to capture one or more consecutive word characters. Also, try matching spaces \s+ instead of word boundaries \b.

(\w+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\w+)\s+(\w+)

Upvotes: 0

Umair Ayub
Umair Ayub

Reputation: 21271

this would work

(\w+)\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(\w+)\s+(\w)

https://regex101.com/r/HGMeRL/1/

Upvotes: 1

Related Questions