CancelImage Upload

How to load all the links from within a specified html document or website in PHP

Loading all the links from a specified website automatically may sound like a huge task? Well, thanks to PHP's DOM library it is a piece of cake. Here is a step by step guide to loading all the link tags contained within a specified page.

Our first steps are to initialise the DOM library and load the specified HTML document via the function loadHTMLFile(), followed by setting a DOMXPath():

<?php

$dom = new DOMDocument;

@$dom->loadHTMLFile('http://www.example.com');

$DOMXPath = new DOMXPath($dom);

...

?>

Next up we will load all anchor tags into the variable $aHref, where we will go through each element one at a time and load the href attributes into an array named $links:

<?php

...

$aHref = $DOMXPath->query('//a[@href]');

foreach ($aHref as $val)
{

     $links[] = $val->getAttribute('href');

}

...

?>

And then finally all that is left to be done is to print out all the links with the PHP print array function print_r():

<?php

...

echo '<pre>'.print_r($links, true);

?>

As you will no doubt see, not all every href contains a complete link. If the link is internal it will more than likely simply be along the lines of "index.html" or "/aboutus.php". Most external links on the other hand will have a format of: "http://www.externallink.com".

Here then is the complete code discussed above:

<?php

$dom = new DOMDocument;

@$dom->loadHTMLFile('http://www.example.com');

$DOMXPath = new DOMXPath($dom);

$aHref = $DOMXPath->query('//a[@href]');

foreach ($aHref as $val)
{

      $links[] = $val->getAttribute('href');

}

echo '<pre>'.print_r($links, true);

?>


Login
Want to leave a comment?

No problem. Just enter your email and password below.


register | home | reminder

myDesignTool Networking • www.mydesigntool.cominfo@mydesigntool.com