List all links on a website using DOM

Have you heard of DOM before? If not it stands for Document Object Model and is a method of accessing XML or XHTML page elements (tags) and attributes, such as <book_title> in XML or <img src=”pie.jpg”> in XHTML (’src’ being the attribute).

Once you start playing with DOM you realize the massive potential it has for parsing information from webpages.

So here is my function I have written with some comments to help you understand a little better.

function getLinks($url) { 
    $doc = new DOMDocument; // Create a new DOMDocument object in $doc 
    $doc->loadHTML(file_get_contents($url)); // Load the contents of our desired website into $doc
    $a = $doc->getElementsByTagName('a'); // Get all of the 'a' XHTML tags and there attributes and store in array $a
    foreach($a as $link) { // Iterate through $a
        $links[] = $link->getAttribute('href'); // Access each 'a' tag and get the contents of its 'href' attribute
    }
    return $links; // Return the array of links
}

You could then do something like…

$links = getLinks('http://jquery.com');
foreach($links as $link) echo $link;

I hope you found this function useful, let me know in the comments if you have any questions.

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

Uslw3O dfv078fnw8f934ndvkg2l

Leave a comment

(required)

(required)