Reputation: 31
I'm trying to use regular expression to extract the name from a string. The name always follow by a protocol. The protocols are: ssh
, folder
, http
.
Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *
Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *
Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *
The expected output would be:
John
Jake
Steve
Upvotes: 1
Views: 4385
Reputation: 27723
Another approach would be to take the single letter and space present right before the names as a left boundary, then collect the names' letters and save it in capturing group $1
, maybe similar to:
\s+[a-z]\s+([A-Z][a-z]+)
We can also add more boundaries to it, if it might be necessary.
If this expression wasn't desired, it can be modified or changed in regex101.com.
jex.im visualizes regular expressions:
const regex = /\s+[a-z]\s+([A-Z][a-z]+)/gm;
const str = `Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *
Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *
Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Upvotes: 0
Reputation: 2293
Try:
\b[A-Za-z]+(?=\s(?=ssh|folder|http))
Regex Demo here.
let regex = /\b[A-Za-z]+(?=\s(?=ssh|folder|http))/g;
[match] = "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 *".match(regex);
console.log(match); //John
[match] = "Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 *".match(regex);
console.log(match); //Jake
[match] = "Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 *".match(regex);
console.log(match); //Steve
Regex explanation:
\b
defines a word boundary to start match
[A-Za-z]
match any alphabet, any case
+
repeat previous character any number of times till next pattern
(?=
finds lookahead pattern (which won't be included in matching group)
\s
a whitespace
(?=ssh|folder|http)
another lookahead to eitherssh
,folder
orhttp
Putting it all together, the regex looks for a word that is followed by a space and then one of the following: ssh, folder, or http.
Upvotes: 1
Reputation: 12438
You can use the following PCRE regex (as you haven't precised which language):
\b[a-zA-Z]+(?=\s+(?:ssh|folder|http))
demo: https://regex101.com/r/t62Ra7/4/
Explanations:
\b
start the match from a word boundary[a-zA-Z]+
match any sequence of ASCII character in a-zA-Z range, you might have to generalise this to accept Unicode letters.(?=
lookahead pattern to add the constraint that the name is followed by one of the protocols\s+
a whitespace class char(?:ssh|folder|http)
non-capturing group for the protocols ssh
, folder
or http
Upvotes: 2
Reputation: 40034
Here's how you might do it in Java.
String[] str = {
"Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o r John ssh 0 * ",
"Thu May 23 22:42:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o i Jake folder 0 * ",
"Thu May 23 22:41:55 2019 19 10.10.10.20 22131676 /mnt/tmp/test.txt b s o t Steve http 0 * ",
};
String pat = "(\\w+) (ssh|folder|http)"; // need to escape the second \
Pattern p = Pattern.compile(pat);
for (String s : str) {
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1));
}
}
}
The actual pattern is in the string pat and can be used with other regex engines. This simply matches a name followed by a space followed by the protocols or'd together. But it captures the name in the first capture group.
Upvotes: 0