Jehanzeb.Malik
Jehanzeb.Malik

Reputation: 3390

URL Pattern Matching (PHP)?

(Programming Language: PHP v5.3)

I am working on this website where I make search on specific websites using google and bing search APIs.

The Project:

A user can select a website to search from a drop-down list. We have an admin panel on this website. If the admin wants to add a new website to the drop-down list, he has to provide two sample URLs from the site as shown below.

Form Image

On the submit of form a code goes through input and generates a regex that we later use for pattern matching. The regex is stored in database for later use.

In a different form the visiting user selects a website from the drop-down list. He then enters the search "query" in a text box. We fetch results as JSON using search APIs(as mentioned above) where we use the following query syntax as search string:

"site:website query"
(where we replace "website" with the website user chose for search and replace "query" with user's search query).

The Problem

Now what we have to do is get the best match of the url. The reason for doing a pattern match is that some times there are unwanted links in search results. For example lets say I search on website "www.example.com" for an article names "abcd". Search engines might return these two urls:

1) www.example.com/articles/854/abcd
2) www.example.com/search/abcd

The first url is the one that I want. Now I have two issues to resolve.

1) I know that the code that I wrote to make a regex pattern from sample URLs is never going to be perfect considering that the admin adds websites on regular basis. There can never be enough conditions to check for creating a pattern for different websites from same code. Is there a better way to do this or regex is my only option?

2) I am developing on a machine running Windows 7 OS. preg_match_all() returns results here. But when I move the code to server which is running Linux OS, preg_match_all() does not return any results for the same parameters? I can't seem to get why that is happening. Anyone knows why is this happening?

I have been working on web technologies for only past few weeks, so I don't know if I have better options than regex. I would be very grateful if you could assist me or guide me towards resources where I can find solution for my problems.

Upvotes: 2

Views: 992

Answers (1)

Tivie
Tivie

Reputation: 18923

About question 1: I can't quite grasp what you're trying to accomplish so I can't give any valid opinion.

Regarding question 2: If both servers are running the same version of PHP, the regex library used ought to be the same. You can test this, however, by making a mock static file or string to test against the regex and see if the results are the same.

Since you're grabbing results from the search engines and then parsing them, the data retrieve might not be the same. Google/Bing change part of the data regarding the OS you use and that might alter preg results.

Upvotes: 1

Related Questions