This is guide to get started with PHP DOM or a quick reminder to those who have a little while since the last time they used it. The extended documentation is in PHP.net, but it is quite long. Here you might found a quick reference to get started in no time.
Purpose of the DOM (Docuement Object Model): It is a convention used to represent and manipulate objects in XML, XHTML and HTML documents. Parsing XML and HTML files is very useful. It allows to manipulate RSS Feeds, interact with APIs and web services through XML (e.g. Google Maps, Facebook and Twitter APIs, etc.), extract information from websites (web crawling) and more.
Getting Started
The DOM implementation in PHP have more than 15 classes! But don't get afraid, for most cases, you might just end up using these ones: DOMNode, DOMDocument, DOMNodeList and DOMElement. In the following UML class diagram of PHP's DOM you will see how these classes are related to each other and them the explanation of each one.
Loading and Saving DOM Documents
DOMDocument — The DOMDocument class which exteds from DOMNode. This class contains the XML (or HTML) elements and configurations. It has configurations attributes, such as format output, preserve white spaces, versions, etc.
DOMDocument must-know methods (part 1: load and save)
- Load: load XML (or HTML) documents. There are different types of loads (quite self-explanatories)
- mixed DOMDocument::load ( string $filename ) — Load XML from a file
- bool DOMDocument::loadHTML ( string $source ) — Load HTML from a string
- bool DOMDocument::loadHTMLFile ( string $filename ) — Load HTML from a file
- mixed DOMDocument::loadXML ( string $source ) — Load XML from a string
- Save: it is used to present (screen or file) the whole DOM document.
- int DOMDocument::save ( string $filename ) — Dumps the internal XML tree back into a file
- string DOMDocument::saveHTML ( ) — Dumps the internal document into a string using HTML formatting
- int DOMDocument::saveHTMLFile ( string $filename ) — Dumps the internal document into a file using HTML formatting
- string DOMDocument::saveXML ( ) — Dumps the internal XML tree back into a string
Example using DOMDocument for loading and showing HTML:
<?php
$dom = new DOMDocuement;
$dom->loadHTML('http://www.adrianmejiarosario.com'); // load website content to DOM
echo $dom->save(); // print to screen
?>
Iterating through DOM Elements
The first thing you need to do after loading the XML that you want to process, it's to select the data that you are intereted in. To search for you data you need to iterate through the DOM elements and you need to know what methods and objects are using in this process.
DOMDocument must-know methods (part 2: get data)
- DOMElement DOMDocument::getElementById ( string $elementId ) — Searches for an element with a certain id.
- DOMNodeList DOMDocument::getElementsByTagName ( string $elementName )— Searches for all elements with given tag name.
- DOMNodelist::item ( int $index ) — Retrieves a node specified by index
- int $DOMNodeList->length - Node list length
- string $DOMNode->nodeName — Returns node name
- string $DOMNode->nodeValue — Returns node name
- DOMNodeList $DOMNode->childNodes — Returns list of nodes
<?php
//TODO
?>