interfacelab

Avatar

Metadata/Attributes in PHP

There’s a ton of stuff I miss from C#, having moved to PHP. Did I say a ton? I meant a megaton.

One of the things I miss the most (besides the sanity) are attributes (annotations for Java peeps). To be able to ascribe metadata to class, method and property definitions opens up a whole new world of introspection which enables you to do some pretty wicked hacks.

In this post, I present a PHP class that allows you to do metadata/attribute programming with PHP.  You can download the class here.  But before we dig in, we must understand what attributes are and how they are useful…

Introduction to Attributes

For those that don’t know about attribute based programming, let me quote an excerpt from OnDotnet:

Attributes are a mechanism for adding metadata, such as compiler instructions and other data about your data, methods, and classes, to the program itself. Attributes are inserted into the metadata and are visible through ILDasm and other metadata-reading tools.

Reflection is the process by which a program can read its own metadata. A program is said to reflect on itself, extracting metadata from its assembly and using that metadata either to inform the user or to modify its own behavior.

One of the last things I wrote in C# before switching over was a simplistic REST-esque framework for web services for C#. I used attributes on classes to mark which ones were exposed to the outside world and then used attributes on the methods to specify such things as the URI, description, etc. This allowed the services to be discoverable versus declared and to be self documenting. For example:

namespace SimpleWebService {
[RestService(baseURI='/simple/',Name='Simple', Description='Simple service', Persistent=false)]
class Service
{
[RestMethod(URI='something',Name='Do Something',Description='Does something.')]
public void DoSomething()
{
// …
}
}
}

In the above example, the attributes are declared in the brackets.  On our Service class, we give the service a name, define it’s base URI, give it a description and declare if the service is persistant between requests (meaning that the class is only instantiated once, or is instantiated every time).  On the DoSomething method it’s a similar deal: we declare the URI endpoint for the method, the name of the method and it’s description.

By going to http://example.com/simple/something would eventually invoke the DoSomething method on our Service class.  The beautiful thing here is that my classes needn’t implement any interfaces or descend from any parent classes to be exposed as a web service.  I could drop the attributes on the class, recompile and they would then be instantly available.  When the application loaded, it scanned all of the classes in the loaded assemblies, did some reflection on them and created a cache of exposed services.  An HTTPHandler would dispatch the incoming requests to the corresponding service, map any POST/GET or URI fragments to the parameters of the method being called and then returned the results of the method call as serialized XML.

Nice and easy.

Attributes and PHP

PHP has no built-in mechanism for declaring attributes, but it does have a primitive retrospection capability.  For any class or function in PHP, there are a set of functions that you can call that will provide information about the class.  For instance, with methods, you can get it’s parameters and any associated block of comments:

class DumbClass
{
    /**
     * Comment block
     */
    public function thing($parameter,$another)
    {
        return false;
    }
}

$method=new ReflectionMethod('DumbClass','thing');
echo $method->getDocComment();

In the above example, we create an instance of the ReflectionMethod class and echo the comment block.  If we run this script, the output would look like:

    /**
     * Comment block
     */

Now if we put 2 and 2 together, you’ll see that we could use the comment block to insert our class metadata.  Now all we need …

Introducting AttributeReader

The AttributeReader class is a simple class that extracts YAML from the comment block on a class, it’s methods or properties.  You can download the class here.  Note:  You must have the pecl syck package installed, details are here.

Since we have access to the comment block, it’s the most obvious place to express our metadata.  Obviously, the 100% correct solution would be to patch PHP to support attributes out of the box, but that’s a much bigger effort that would require some serious coding.  The performance of this method is fast enough for most use cases.

So how do we use this?  At massify, we use metadata for our ORM layer.  All of our models have metadata attached to them that define what database table to use, the column name, etc.  Here’s an example:

/**
 * Sample model
 *
 * [[
 * table: sample.item
 * database: default
 * read_only: false
 * ]]
 *
 */
class Item extends Model
{
    //@ fields

    /**
     * [[
     * label: Title
     * type: string
     * length: 32
     * description: Title of the item
     * validate:
     *   required: true
     *   length: 8-32
     *   unique: true
     * ]]
     */
    public $title;

    /**
     * [[
     * label: URI
     * type: string
     * length: 32
     * description: URI of the item
     * validate:
     *   required: true
     *   length: 4-32
     *   unique: true
     *   format: alpha_numeric
     * ]]
     */
    public $uri;

    /**
     * [[
     * label: Description
     * type: text
     * description: Description of the item
     * ]]
     */
    public $description;

    //@ end fields
}

In the above example, our metadata is nestled between double brackets [[]].  Inside the double brackets is YAML expressing the metadata.  The cool thing here is that we can nest attributes of the metadata.  Let’s look at what we’ve done:

On the declaration for the Item class, we’ve described metadata that tells us which database table this model represents, which database it resides in and if it’s read_only (a view, in database parlance).  On the properties of the model, the metadata describes the label to use for forms, what the database type is (simplified), the length (for string types), a description and a list of validators.  For instance, on the $uri property, we’ve declared validators that make sure the property has a value before saving (required), is of a specified length (4-32 characters), is unique in the database (unique) and matches a specific format (alpha_numeric).

Since the class extends from Model, the model knows to use the metadata to build the correct SQL statements for insert, updates, etc.  It also knows how to validate itself when saved to the database, returning the correct errors to the user if any of the validations fail.  Finally, we are able to automatically build forms to edit the models by using the metadata as a guide when constructing the form in code.

All of this is completely possible to do without using metadata, but would require a lot of redundant code with a predetermined set of use cases.  Metadata frees us from this because we can write use cases that work with a known quantity, rather than the other way around.

Using the AttributeReader

Using the class is straight forward.  Here’s an example.  Let’s assume we have the Item model from the previous example loaded and want to extract metadata for it and it’s URI property:

$item=new Item();

$class=AttributeReader::ClassAttributes($item);

echo $class->database;
echo $class->table;

$method=AttributeReader::PropertyAttributes($item,'uri');

echo $method->label;
if ($method->validate->required)
    echo "Required field";
else
    echo "Not required.";

It’s that simple.

Caveats

Make sure you are familiar with the rules of YAML.

Performance is acceptable, but always make sure you measure performance for yourself.  We use a caching strategy via APC at massify for caching model metadata.  This requires that you restart apache if your models change (you can disable APC on your development/staging to get around this).  With the APC caching strategy, it’s super fast.  Find out what works best for you.

I hope people find this useful, I think it opens a whole new world for serious PHP development.  Feel free to drop me an email if you have questions or successes.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]
1 Star2 Stars3 Stars4 Stars5 Stars (11 votes, average: 4.27 out of 5)
Loading ... Loading ...
  • Excellent idea...in fact dare I say brilliant? :P

    I've been poking around for a while now trying to figure out a way to use reflection that would be "interesting" and "innovative" I think this might be it.

    The part where you said, using reflection to obtain meta data and have that information be discoverable rather than declarative...I'm not sure I agree 100% but the idea of marking my methods as REST-able (for lack of a better word) by indicating that in the comment as meta-data...

    Damn thats a good idea and I owe it all to this article. :P

    Cheers,
    Alex
  • Kevin
    Thank you so much... You just saved me a ton of time.

    Any speed issues using this approach?
  • Well, it definitely isn't blazing fast, but I don't believe the overhead is too taxing. That said, we use APC's shared memory cache to cache metadata so that we only really load it when the application starts up (the cache sticks around until you restart the server or until APC restarts itself). Using this strategy, once it's cached it's barely noticeable.

    You could do the same thing using memcached, but APC's is much faster as the memory is local to its process.
  • Kevin
    I had an idea that might work out well in replacement for the format. Maybe using a JSON object and parsing it with json_decode. That way it's converted to an associated array on it's own without the need to convert it manually.
  • JSON would be great here as well. Honestly, the only reason I chose YAML is because all of our other configuration is done with that.
  • quocbao
    Your post was quite wonderful :D , I just translate the technique, give your example.

    Thanks for your nice post :)
  • david
    I use annotations with a array style notation like:

    /**
    * @someAnno(1=>2, "yeah"=> array("no","yes"))
    */

    It's really too bad that the language itself doesn't support it (outside doc comments) as it would be much faster and not require the caching.
  • jgmassify
    Nice one.

    Yeah, it's unfortunate that this is really a huge hack to get a language feature that would be a nice to have.

    I'm going to be re-posting a patch for PHP that enables array shortcut notation in the next couple of days. The patch was originally by Ryusuke Sekiyama, posted here: http://marc.info/?l=php-internals&m=119995972028293&w=2

    Instead of using array() you can use [], eg:

    $var=[1,2,3,4];
    $othervar=[ 'test' => [1,2], 'sick' => [1,[3,4]]];
  • That's VERY interesting
  • That's an interesting technique.

    What I'm more interested in is why you left c#. I'm in the same boat. At work I do c# and it's pretty damned nice. I also maintain a boatload of wordpress blogs so I have to learn PHP.
  • iflyhigh
    I'm the CTO for massify.com. When I signed on the site was being written in PHP. Unfortunately, it was a hideous wreck so we had to rewrite it, but thought we could use some of the code that had already been written. That wasn't the case however.

    Honestly, I don't mind it, there are some benefits to it, but I do miss C# and am sort of kicking myself for not having done the rewrite in Python.

    C'est la vie. :)
blog comments powered by Disqus