user3340270
user3340270

Reputation: 187

Can Matlab eliminate the path in URL and left only the domain part?

Can Matlab eliminate the path in URL and leave only the domain part? Does Matlab have any function to eliminate the path behind?

Let's say, example 1:

 input  :http://www.mathworks.com/help/images/removing-noise-from-images.html
 output :http://www.mathworks.com

Upvotes: 2

Views: 379

Answers (2)

chappjc
chappjc

Reputation: 30589

This regexp pattern should do the trick:

>> str = 'http://www.mathworks.com/help/images/removing-noise-from-images.html';
>> out = regexp(str,'\w*://[^/]*','match','once')
out = 
    'http://www.mathworks.com'

The search pattern '\w*://[^/]*' says look for a string that starts with some "word" characters ('\w*) corresponding to the protocol (e.g. http, https, rtsp), followed by the ubiquitous ://, and then any number of characters that are not a forward slash ([^/]*).

Edit: The 'once' option should eliminate a nested cell.


UPDATE: just the hostname, allowing inputs with no protocol.

>> str = {'http://www.mathworks.com/help/images/removing-noise-from-images.html';
          'https://www.mathworks.com/help/matlab/ref/[email protected]';
          'google.com/voice'}
>> out = regexp(str,'([^/]*)(?=/[^/])','match','once')
out = 
    'www.mathworks.com'
    'www.mathworks.com'
    'google.com'

UPDATE 2: regexp madness!

>> str = {'http://www.mathworks.com/help/images/removing-noise-from-images.html';
          'https://www.mathworks.com/help/matlab/ref/[email protected]';
          'google.com/voice';
          'http://monkey.org/';
          'stackoverflow.com/';
          'meta.stackoverflow.com'};
>> out = regexp(str,'.*?[^/](?=(/([^/]|$)|$))','match','once')
out = 
    'http://www.mathworks.com'
    'https://www.mathworks.com'
    'google.com'
    'http://monkey.org'
    'stackoverflow.com'
    'meta.stackoverflow.com'

% hostname.m
function hostnames = hostname(str)

hostnames = regexp(str,'.*?[^/](?=(/([^/]|$)|$))','match','once');

Upvotes: 2

Divakar
Divakar

Reputation: 221624

Code:

function output_url = domain_name(input_url)

c1 = strfind(input_url,'//');
ind1 = strfind(input_url,'/');

if isempty(c1) && isempty(ind1) 
    output_url = input_url; % For case like - www.mathworks.com
    return;
end

if ~isempty(c1)
    if numel(ind1)>2
        output_url = input_url(1:ind1(3)-1); % For cases like - http://www.mathworks.com/ or http://www.mathworks.com/something/
    else
        output_url = input_url; % For case like - http://www.mathworks.com
    end
else
    output_url = input_url(1:ind1(1)-1); % For cases like - www.mathworks.com/ or www.mathworks.com/something/
end

return;

Example runs:

%% Long URLs with extensions
disp(domain_name('www.mathworks.com/help/images/removing-noise-from-images.html'))
disp(domain_name('http://www.mathworks.com/help/images/removing-noise-from-images.html'))

%% Short URLs without HTTP://
disp(domain_name('www.mathworks.com'))
disp(domain_name('www.mathworks.com/'))

%% Short URLs with HTTP://
disp(domain_name('http://www.mathworks.com'))
disp(domain_name('http://www.mathworks.com/'))

Return:

www.mathworks.com
http://www.mathworks.com
www.mathworks.com
www.mathworks.com
http://www.mathworks.com
http://www.mathworks.com

An alternative method and probably efficient one would be to use REGEXP, but apparently I prefer numbers.

Edit 1: If you prefer to use bunch of URLs at the sametime, you may use a cell array. Obviously, the output would be a cell array too. Look at the following MATLAB script to get a feel of it -

% Input
in_urls_cell = [{'http://mathworks.com/'},{'mathworks.com/help/matlab/ref/strcmpi.html'},{'mathworks.com/help/matlab/ref/[email protected]'}];

% Get domain name
out_urls_cell = cell(size(in_urls_cell));
for count = 1:numel(in_urls_cell)
    out_urls_cell(count)={domain_name(cell2mat(in_urls_cell(count)))};
end

% Display only domain name
for count = 1:numel(out_urls_cell)
    disp(cell2mat(out_urls_cell(count)));
end  

The above script returns -

http://mathworks.com
mathworks.com
mathworks.com

Upvotes: 2

Related Questions