Can Matlab eliminate the path in URL and left only the domain part?

Question

Can Matlab eliminate the path in URL and leave only the domain part? Does Matlab have any function to eliminate the path behind?

Let's say, example 1:

 input  :http://www.mathworks.com/help/images/removing-noise-from-images.html
 output :http://www.mathworks.com

chappjc · Accepted Answer

This regexp pattern should do the trick:

>> str = 'http://www.mathworks.com/help/images/removing-noise-from-images.html';
>> out = regexp(str,'\w*://[^/]*','match','once')
out = 
    'http://www.mathworks.com'

The search pattern '\w*://[^/]*' says look for a string that starts with some "word" characters ('\w*) corresponding to the protocol (e.g. http, https, rtsp), followed by the ubiquitous ://, and then any number of characters that are not a forward slash ([^/]*).

Edit: The 'once' option should eliminate a nested cell.

UPDATE: just the hostname, allowing inputs with no protocol.

>> str = {'http://www.mathworks.com/help/images/removing-noise-from-images.html';
          'https://www.mathworks.com/help/matlab/ref/strcmpi@dfvfv.html';
          'google.com/voice'}
>> out = regexp(str,'([^/]*)(?=/[^/])','match','once')
out = 
    'www.mathworks.com'
    'www.mathworks.com'
    'google.com'

UPDATE 2: regexp madness!

>> str = {'http://www.mathworks.com/help/images/removing-noise-from-images.html';
          'https://www.mathworks.com/help/matlab/ref/strcmpi@dfvfv.html';
          'google.com/voice';
          'http://monkey.org/';
          'stackoverflow.com/';
          'meta.stackoverflow.com'};
>> out = regexp(str,'.*?[^/](?=(/([^/]|$)|$))','match','once')
out = 
    'http://www.mathworks.com'
    'https://www.mathworks.com'
    'google.com'
    'http://monkey.org'
    'stackoverflow.com'
    'meta.stackoverflow.com'

% hostname.m
function hostnames = hostname(str)

hostnames = regexp(str,'.*?[^/](?=(/([^/]|$)|$))','match','once');

Can Matlab eliminate the path in URL and left only the domain part?

Answers (2)

Related Questions