Reputation: 59
I have in a log file some lines like this:
11-test.domain1.com Logged ...
37-user1.users.domain2.org Logged ...
48-me.server.domain3.net Logged ...
How can I extract each domain without the subdomains? Something between "-" and "Logged".
I have the following code in c++ (linux) but it doesn't extract well. Some function which is returning the extracted string would be great if you have some example of course.
regex_t preg;
regmatch_t mtch[1];
size_t rm, nmatch;
char tempstr[1024] = "";
int start;
rm=regcomp(&preg, "-[^<]+Logged", REG_EXTENDED);
nmatch = 1;
while(regexec(&preg, buffer+start, nmatch, mtch, 0)==0) /* Found a match */
{
strncpy(host, buffer+start+mtch[0].rm_so+3, mtch[0].rm_eo-mtch[0].rm_so-7);
printf("%s\n", tempstr);
start +=mtch[0].rm_eo;
memset(host, '\0', strlen(host));
}
regfree(&preg);
Thank you!
P.S. no, I cannot use perl for this because this part is inside of a larger c program which was made by someone else.
EDIT:
I replace the code with this one:
const char *p1 = strstr(buffer, "-")+1;
const char *p2 = strstr(p1, " Logged");
size_t len = p2-p1;
char *res = (char*)malloc(sizeof(char)*(len+1));
strncpy(res, p1, len);
res[len] = '\0';
which is extracting very good the whole domain including subdomains. How can I extract just the domain.com or domain.net from abc.def.domain.com ?
is strtok a good option and how can I calculate which is the last dot ?
Upvotes: 1
Views: 413
Reputation: 289
Is the in a standard format? it appears so, is there a split function?
Edit: Here is some logic. Iterate through each domain to be parsed Find a function to locate the index of the first string "-" Next find the index of the second string minus the first string "Logged" Now you have the full domain.
Once you have the full domain "Split" the domain into your object of choice (I used an array) now that you have the array broken apart locate the index of the value you wish to reassemble (concatenate) to capture only the domain.
NOTE Written in C#
Main method which defines the first value and the second value
`static void Main(string[] args) { string firstValue ="-"; string secondValue = "Logged"; List domains = new List { "11-test.domain1.com Logged", "37-user1.users.domain2.org Logged","48-me.server.domain3.net Logged"}; foreach (string dns in domains) { Debug.WriteLine(Utility.GetStringBetweenFirstAndSecond(dns, firstValue, secondValue)); } } `
Method to parse the string:
`public string GetStringBetweenFirstAndSecond(string str, string firstStringToFind, string secondStringToFind) { string domain = string.Empty; if(string.IsNullOrEmpty(str)) { //throw an exception, return gracefully, whatever you determine } else { //This can all be done in one line, but I broke it apart so it can be better understood. //returns the first occurrance. //int start = str.IndexOf(firstStringToFind) + 1; //int end = str.IndexOf(secondStringToFind); //domain = str.Substring(start, end - start); //i.e. Definitely not quite as legible, but doesn't create object unnecessarily domain = str.Substring((str.IndexOf(firstStringToFind) + 1), str.IndexOf(secondStringToFind) - (str.IndexOf(firstStringToFind) + 1)); string[] dArray = domain.Split('.'); if (dArray.Length > 0) { if (dArray.Length > 2) { domain = string.Format("{0}.{1}", dArray[dArray.Length - 2], dArray[dArray.Length - 1]); } } } return domain; } `
Upvotes: 0
Reputation: 55887
#include <vector>
#include <string>
#include <boost/regex.hpp>
int main()
{
boost::regex re(".+-(?<domain>.+)\\s*Logged");
std::string examples[] =
{
"11-test.domain1.com Logged ...",
"37-user1.users.domain2.org Logged ..."
};
std::vector<std::string> vec(examples, examples + sizeof(examples) / sizeof(*examples));
std::for_each(vec.begin(), vec.end(), [&re](const std::string& s)
{
boost::smatch match;
if (boost::regex_search(s, match, re))
{
std::cout << match["domain"] << std::endl;
}
});
}
http://liveworkspace.org/code/1983494e6e9e884b7e539690ebf98eb5 something like this with boost::regex. Don't know about pcre.
Upvotes: 1