List all links on a website using DOM
Have you heard of DOM before? If not it stands for Document Object Model and is a method of accessing XML or XHTML page elements (tags) and attributes, such as <book_title> in XML or <img src=”pie.jpg”> in XHTML (’src’ being the attribute).
Once you start playing with DOM you realize the massive potential it has for parsing information from webpages.
So here is my function I have written with some comments to help you understand a little better.
function getLinks($url) { $doc = new DOMDocument; // Create a new DOMDocument object in $doc $doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc $a = $doc->getElementsByTagName('a'); // Get all of the 'a' XHTML tags and there attributes and store in array $a foreach($a as $link) { // Iterate through $a $links[] = $link->getAttribute('href'); // Access each 'a' tag and get the contents of its 'href' attribute } return $links; // Return the array of links }
You could then do something like…
$links = getLinks('http://jquery.com'); foreach($links as $link) echo $link;
I hope you found this function useful, let me know in the comments if you have any questions.
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Uslw3O dfv078fnw8f934ndvkg2l