art3m1sm00n
art3m1sm00n

Reputation: 409

In java, what's the best way to read a url and split it into its parts?

Firstly, I am aware that there are other posts similar, but since mine is using a URL and I am not always sure what my delimiter will be, I feel that I am alright posting my question. My assignment is to make a crude web browser. I have a textField that a user enters the desired URL into. I then have obviously have to navigate to that webpage. Here is an example from my teacher of what my code would look kinda like. This is the code i'm suposed to be sending to my socket. Sample url: http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

 GET /wiki/Hypertext_Transfer_Protocol HTTP/1.1\n
Host: en.wikipedia.org\n
\n

So my question is this: I am going to read in the url as just one complete string, so how do I extract just the "en.wikipedia.org" part and just the extension? I tried this as a test:

 String url = "http://en.wikipedia.org/wiki/Hypertext Transfer Protocol";
    String done = " ";
    String[] hope = url.split(".org");

    for ( int i = 0; i < hope.length; i++)
    {
        done = done + hope[i];
    }
    System.out.println(done);

This just prints out the URL without the ".org" in it. I think i'm on the right track. I am just not sure. Also, I know that websites can have different endings (.org, .com, .edu, etc) so I am assuming i'll have to have a few if statements that compenstate for the possible different endings. Basically, how do I get the url into the two parts that I need?

Upvotes: 12

Views: 43185

Answers (5)

Nolequen
Nolequen

Reputation: 4257

Even though the answer with URL class is great, here is one more way to split URL to components using REGEXP:

"^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?"
      ||            |  |          |       |   |        | |
      12 - scheme   |  |          |       |   |        | |
                    3  4 - authority, includes hostname/ip and port number.
                                  5 - path|   |        | |
                                          6   7 - query| |
                                                       8 9 - fragment

You can use it with Pattern class:

var regex = "^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?";
var pattern = Pattern.compile(regex);
var matcher = pattern.matcher("http://example.com:80/docs/books/tutorial/index.html?name=networking#DOWNLOADING");
if (matcher.matches()) {
  System.out.println("scheme: " + matcher.group(2));
  System.out.println("authority: " + matcher.group(4));
  System.out.println("path: " + matcher.group(5));
  System.out.println("query: " + matcher.group(7));
  System.out.println("fragment: " + matcher.group(9));
}

Upvotes: 1

Abhinav Katyayen
Abhinav Katyayen

Reputation: 1

you can use String class split() and store the result into the String array then iterate the array and store the variable and value into the Map.

public class URLSPlit {
    public static Map<String,String> splitString(String s) {
        String[] split = s.split("[= & ?]+");
        int length = split.length;
        Map<String, String> maps = new HashMap<>();

        for (int i=0; i<length; i+=2){
              maps.put(split[i], split[i+1]);
        }

        return maps;
    }

    public static void main(String[] args) {
        String word = "q=java+online+compiler&rlz=1C1GCEA_enIN816IN816&oq=java+online+compiler&aqs=chrome..69i57j69i60.18920j0j1&sourceid=chrome&ie=UTF-8?k1=v1";
        Map<String, String> newmap =  splitString(word);

        for(Map.Entry map: newmap.entrySet()){
            System.out.println(map.getKey()+"  =  "+map.getValue());
        }
    }
}

Upvotes: -1

Reza Shirazian
Reza Shirazian

Reputation: 2353

Instead of url.split(".org"); try url.split("/"); and iterate through your array of strings.

Or you can look into regular expressions. This is a good example to start with.

Good luck on your homework.

Upvotes: 1

piokuc
piokuc

Reputation: 26184

This is how you should split your URL parts: http://docs.oracle.com/javase/tutorial/networking/urls/urlInfo.html

Upvotes: 1

&#211;scar L&#243;pez
&#211;scar L&#243;pez

Reputation: 236014

The URL class pretty much does this, look at the tutorial. For example, given this URL:

http://example.com:80/docs/books/tutorial/index.html?name=networking#DOWNLOADING

This is the kind of information you can expect to obtain:

protocol = http
authority = example.com:80
host = example.com
port = 80
path = /docs/books/tutorial/index.html
query = name=networking
filename = /docs/books/tutorial/index.html?name=networking
ref = DOWNLOADING

Upvotes: 46

Related Questions