PHP has the super simple, and indeed aptly named SimpleXMLElement class. Give it a string or a url and voila, parsed XML. Except, wait, the arch-nemesis of native PHP objects looms ahead - serialization!
Now, lets add in some extra stuff to help with Google data like like etags and some default versioning headers.
class Feed { /** * Etag * @var string * @see http://code.google.com/apis/gdata/docs/2.0/reference.html#ResourceVersioning */ public $etag; /** * A list of namespaces used in this feed * @var array */ public $namespaces = array(); /** * Internal xml object * @var SimpleXMLElement */ protected $_xml; /** * Default cUrl headers * @var array */ private static $default_headers = array( 'GData-Version: 2', ); /** * Constructor */ protected function __construct($feed) { $this->_xml = $feed; if (is_string($this->_xml)) { $this->_xml = simplexml_load_string($feed); } if ($this->_xml instanceof SimpleXMLElement) { $this->namespaces = (array)$this->_xml->getDocNamespaces(); $this->etag = (string) $this->_xml->attributes($this->namespaces['gd'])->etag; } } /** * Get variables from the xml feed * @param string $var * @return mixed */ public function __get($var) { // Allow namespaces to be accessed as $this->namespace->var... if (array_key_exists($var, (array)$this->namespaces)) { return $this->_xml->children($this->namespaces[$var]); } else if ($var == 'namespaces') { return $this->namespaces; } else return $this->_xml->$var; } /** * Load a feed either from the web or from the cache * @param string $url * @param string $class * @return Feed */ public static function load($url, $class = 'Feed') { if (!IN_PRODUCTION) cache::delete($url); // Always clear cache in dev mode. // Check if there is a cached version. $feed = cache::read($url); // Try and get the feed using the etag. try { $headers = self::$default_headers; if ($feed !== null) { $headers[] = 'If-None-Match: '.$feed->etag; } /** * @see http://www.php.net/manual/en/function.curl-setopt.php */ $raw_feed = remote::get( $url, array( CURLOPT_HTTPHEADER => $headers, // CURLOPT_HEADER => true, // For debugging )); $feed = new $class($raw_feed); cache::write($url, $feed); } catch (Exception $e) { // Catch 304 errors - Content Not Modified if ($e->getCode() == 304) { // Return cached version return $feed; } else throw $e; } return $feed; } /** * __wakeup - unserialize */ public function __wakeup() { /* * Reload the SimpleXMLElement from the raw, * serialized string. */ $this->_xml = simplexml_load_string($this->_xmlRaw); unset($this->_xmlRaw); } /** * __sleep - serialize */ public function __sleep() { /* * The SimpleXMLElement $this->_xml can't be serialized, * so we have to to get the raw xml feed. However, this * instance may still be used after serialization, so * we can't just reassign $this->_xml. We use a temp * variable, _xmlRaw, to store the raw feed for * serialization. */ if ($this->_xml instanceof SimpleXMLElement) $this->_xmlRaw = $this->_xml->asXML(); $members = array_keys(get_object_vars($this)); unset($members[ array_search('_xml', $members) ]); return $members; } }
One caveat - this is meant to work with Kohana 3, but could be easily adapted to standalone code. With Kohana though, the remote::get() call needs one fix to work correctly. The vanilla code throws an exception on anything but HTTP 200 responses, okay, however the exception does not include the returned status. Google's etag will return 304 if the feed has not been modified. See this issue. The fix is really simple:
Kohana: remote.php
// Close the connection curl_close($remote); if (isset($error)) { throw new Kohana_Exception('Error fetching remote :url [ status :code ] :error', array(':url' => $url, ':code' => $code, ':error' => $error), $code); } return $response;
Usage
So now what?
We have an object that can be instantiated, but there's also a static method - load(), that handles caching the feed and checking the etag. We can even have it return an instance that inherits from Feed and does some extra processing to the feed.
load($url, $class = 'Feed')
Static method that loads the feed at the given url. Handles caching the feed and checking against the etag to see if the feed has been updated. This saves you response time
__get($var)
Ah magic methods. This lets us access the SimpleXMLElement as $feed->title, $feed->entry, etc... It also adds a bit to auto-children namespaces. $feed->gdata->id, etc...
$feed = feed::load($url); echo $feed->title; echo $feed->gdata->id; echo $feed->link->attributes()->href; foreach ($feed->entry as $entry) { // Loop all the entries from this feed }
No comments:
Post a Comment