Category Archives: Planet Ibuildings

  • [Lorna Mitchell] PHP London Conference: In Review March 8, 2010

    I’m really late with this post, but I wanted to write about the PHP London Conference which was held in London last Friday. The event was in a great venue and had hoards of people – this was my fourth year in attendance!! They do, however, have the longest twitter tag in history #phpuk2010!

    This year I had the privilege of speaking at this event, although I was concerned that I had to stay coherent and alert right through to the graveyard slot at 4:30pm (conference organisers take note: I really am much sparklier in the mornings!). I kept myself awake by attending what I affectionately refer to as the “Ibuildings track” – with 4 speakers at the event, it did feel like a bit of an invasion by myself and my colleagues. In our defence I can only say that we are a pretty big local PHP employer and, as a developer, I’m happy to be working for someone who sends all their developers to these events, and even happier to be in the company of those other excellent speakers as colleagues!

    My talk was entitled “Best Practices in Web Service Design” although perhaps “Things I Wish Web Service Creators Would Consider Before Writing Unclear and Unstable Useless And Frustrating Services” would have been a better title! I talked about web services in general, a bit about HTTP and the various service types, and also gave some general tips and tricks for writing good, stable services. In a bit of a break with geeky tradition, I then talked about services as a whole package, and how to deliver and document them in a way that helps users help themselves. If you are interested the slides are here:

    http://www.slideshare.net/lornajane/best-practices-in-web-service-design

    The experience was overall very positive for me, I haven’t spoken at this conference before and I was very pleased to be included. My talk went quite smoothly, with my nerves nicely hidden away (I’ve had issues with this lately), and I also avoided falling over either the curtain or the piece of screen that was carefully placed to trip unwary speakers! I’d like to thank everyone who came and asked questions afterwards, and all those who saw my talk and left comments for me on my joind.in talk page – it all helps me to do better next time, thanks and I’ll see you all next year!

  • [Ian Barber]Speaking March 1, 2010

    Thanks to everyone that came along to my talk about integrating search engines at the wonderful PHP UK 2010. The slides are available over at Slideshare. It was great speaking to so many people afterward about the challenges and solutions they’ve found with various engines, and the conference itself was top notch – congratulations to the organisers.

    I’ll be giving a (slightly shortened) version of the same talk at the Dutch PHP Conference 2010, which June 10-12th in the ever lovely Amsterdam. The schedule is absolutely chocker with great talks, from people like Elizabeth Naramore, Helgi, Thijs, Derick, Johannes and a load more. I’m particularly looking forward to Marcus’ talk on geo data, and I would be looking forward to Sam de Freyssinet’s talk on HMVC, except it’s on at the same time as mine!

  • [Vincenzo Russo] Drupal and its community issues February 20, 2010

    I love Drupal. I really do, even if many people do not understand why (one day I will explain to you what you don’t see, guys). I’ve been working with it since the version 4.6. Then, during the 2008/09 I stopped for like a year (for many reasons that are not worth to be told), but eventually I came back aboard, and now I am also the “specialist in charge” for this platform at Ibuildings.

    For a long time I never had the chance to contribute “seriously” (just some little patches here and there) in the community, but now different working conditions allow me to do so. This brought to my attention some issues in the community attitude. And when it comes to social attitudes, I am not an easy person to deal with when I get disappointed.

    Maintainers can decline patches for any reason

    All started with me trying to contribute a patch to the Content Copy module. You can read the whole thread as well as some of my recent posts to know about all the technical implications.

    However, here I’ll be focusing on the political/sociological/philosophical aspect of the matter. I won’t talk again about avery single disagreement, but just about the most important fact: markus_pretrus refused the patch for no valid technical reasons, but just for personal concerns. He never brought a valid counter-example that proved the patch to be dangerous, he was just afraid the patch could cause harm somehow.

    (In addition, he later said that I should have used CRUD APIs to update a content type and not the import procedure, never realising that what my patch did was to use CRUD APIs to enhance the import procedure and making it able to import a new content type definition into a pre-existing copy).

    So, the point is: if maintainers have personal concerns not supported by any proof, they are still in the position to refuse the patch, whereas the correct behaviour should be to accept the patch, test it extensively, etc. etc. We are still talking about software, after all.

    Chain of trust might be dangerous

    The usual chain of trust we find in communities might be dangerous in this case, as it happened that someone else found perfectly reasonable the markus_petrus’ argument. This, of course, led to the situation where my CVS application to publish the Alternate Content Copy module (a module created out of the patch of the discord) was refused. The reason was simple: such a contribute is supposed to be a patch to the original module, not a module itself, and that makes sense. But at the same time the patch won’t be accepted. Deadlock.

    The CVS application process flaw

    All the story above led me to realise the big flaw of the Drupal CVS application process. They have some conditions to be respected in order for your application to be successful. And to be honest, they make sense. What a pity, though, that once you’ve got your CVS account, you can create as many projects as you want. First of all, this doesn’t make sense; secondly, it makes too easy to bypass the conditions for the application: all you have to do is to take over an abandoned module (easy) and apply for a CVS account (that you need to maintain the module you took over).

    Conclusions

    I am sure that everyone in the community has got the best intentions, but when a situation like mine shows up, it makes it too easy to mutate a community in an oligarchy. Also, having such a flaw in the CVS application process, makes several efforts worthless.

    Share and Enjoy: Print Digg del.icio.us Facebook Google Bookmarks FriendFeed LinkedIn Ping.fm StumbleUpon Twitter Wikio

  • [Lorna Mitchell] An iPhone App for Joind.in February 15, 2010

    Recently I’ve been doing some bits and pieces with the open sourced event feedback site joind.in, including some work on its API to facilitate development of an iphone app. As a conference attendee, speaker and organiser, I use this site a lot for the various events that I am involved with and its a great asset.

    My boyfriend Kevin was thinking of developing an iphone app, mostly to find out more about the technology, and I suggested he take a look at the API for joind.in and consider building something on that. The joind.in project belongs to enygma, a.k.a. Chris Cornutt from phpdeveloper.org and he has the code available on github – so we grabbed it. The API wasn’t previously used by much so we were able to tidy it up a bit and then consume it from the iphone to suit our needs. Chris has accepted my alterations to his existing project with grace – even when I’ve totally broken the live site with them!!

    The joind.in site is a classic MVC setup and the API already existed within the application. It is implemented with a separate set of controllers for the various actions supported by the API, which all inherit from a controller which handles the output formats etc for the XML and JSON responses. It isn’t the world’s best API but its perfectly sufficient for the task at hand – I intend to write some examples for using it but until then you can read this post from Derick about how he used the joind.in API to pull in comments on his talks onto his own site.

    The app itself has the core functionality of joind.in that an attendee would want in his pocket at an event. The events and their details are there, along with the talks at each event. Attendees can leave comments on the various talks and socials, and these can be browsed in the app as well. To give you a little taste of the app, here are some screenshots:

    img_0023 img_0024 img_0025

    If you have an iphone or ipod touch and you’re attending an event any time soon, then download the app – its under “utilities” in the app store. Comments, suggestions, bug reports and feature requests are all gratefully received (no promises about fixing/implementing them but we’ll do our best!). Our app went from submission to approved in 3 days which is very fast – thanks apple!

  • [Lorna Mitchell] PHP and JSON February 12, 2010

    This is a quick outline on working with JSON from PHP, which is actually pretty simple to do. This post has some examples on how to do it and what the results should look like. JSON stands for JavaScript Object Notation, and is widely used in many languages (not just JavaScript) for serialisation. It is particularly popular for use in web services.

    Writing JSON From PHP

    Imagine we have a multidimensional array in PHP that looks something like this:

    $menu[’starter’] = array( “prawn cocktail”,
                              “soup of the day”);
    $menu[‘main course’] = array( “roast chicken”,
                                  “fish ‘n’ chips”,
                                  “macaroni cheese”);
    $menu[‘pudding’] = array( “cheesecake”,
                              “treacle sponge”);

    echo json_encode($menu);
     

    The output of this script looks like this:

    {“starter”:["prawn cocktail","soup of the day"],”main course”:["roast chicken","fish 'n' chips","macaroni cheese"],”pudding”:["cheesecake","treacle sponge"]}

    This is pretty typical of a JSON output string – you can see the curly brackets to enclose the whole thing, then some square brackets to show the nesting levels within the key/value formats. JSON is an ideal format for many applications because it is easy to understand and debug, its quite concise, and many languages have built-in support just like PHP.

    Reading JSON Data From PHP

    Once we’ve serialised the string, we might want to unserialise it again – and the PHP code for that is every bit as simple as the previous example, except that we use the function json_decode() instead of json_encode(). I’ve set the output of the previous script as the input to this one:

    $json = ‘{“starter”:["prawn cocktail","soup of the day"],”main course”:["roast chicken","fish \'n\' chips","macaroni cheese"],”pudding”:["cheesecake","treacle sponge"]}’;

    print_r(json_decode($json));
     

    This decodes the string and then dumps it using print_r() – the output of my script looked like this:

    stdClass Object
    (
    [starter] => Array
    (
    [0] => prawn cocktail
    [1] => soup of the day
    )

    [main course] => Array
    (
    [0] => roast chicken
    [1] => fish 'n' chips
    [2] => macaroni cheese
    )

    [pudding] => Array
    (
    [0] => cheesecake
    [1] => treacle sponge
    )

    )

    Note that the data isn’t identical to how it looked when it went in – JSON can’t distinguish between arrays and objects, and doesn’t retain information about data types. So its perfect for a web service where we just want to convey the information, but may be too loose for other applications.

    The examples here were taken from a talk I give about consuming web services – you can see all the slides on slideshare. If you have any additions or alternatives, leave a comment!

  • [Ivo Jansch] PHP as a template language February 11, 2010

    It’s been a while since I blogged, but I just ran into another zealot pointing me to NoSmarty.net when I mentioned templating.

    I think I’ve said it before. The tool you use should depend on the job you’re trying to do. So to say that Smarty is wrong just because it is, does not feel right.

    I agree that in many cases PHP can be used as a template language just fine, but there are situations where a Smarty template (or any other templating engine) is just that more pleasant.

    Here’s a bit of template code that I encountered yesterday. Its use of php as a template language is hideous. Because it’s a template for an xml message and because it needs to cope with systems with short open tags on and off, it looks like this:

     
    <?php echo '<'; ?>?xml version="1.0" encoding="UTF-8"?>
    <result processed="<?php echo $data["processed"]?"yes":"no"; ?>" <?php if (isset($data["orderid"])) { ?>orderId="<?php echo $data["orderid"]; ?>"<?php } ?> > <?php if (isset($data["error"])) { ?><error message="<?php echo $data["error"]; ?>" /><?php } ?>
    </result>
     

    Hideous!

    Here’s what it would look like in Smarty:

     
    <?xml version="1.0" encoding="UTF-8"?>
    <result processed="{if $data.processed}yes{else}no{/if}" {if $data.orderid} orderId="{$data.orderid}"> {if $data.error} <error message="{$data.error}" />{/if}
    </result>
     

    Yes, the first one is slightly more efficient, but the second one is actually readable for the average person.

    Anybody claiming that <?php } ?> is ‘just as convenient’ as {/if} does not think clearly.

    In my humble opinion, of course.

  • [Lorna Mitchell] PHPBenelux: Recap February 4, 2010

    Last weekend I was privileged to speak at the inaugural PHPBenelux conference in Antwerp, Belgium. Since Ibuildings is partly a dutch company I combined this with one of my regular trips to meet with the people there, visiting both our offices in the Netherlands and catching up with a bunch of colleagues in both locations before making my way to Belgium for the main event.

    The conference itself was very well organised and the venue worked very nicely. I liked the hotel (I’m accustomed to London hotel rooms so European ones always seem huge), which was nice and had an English slant on breakfast since sausages were available alongside the cheese and pastries! The venue itself was just across the car park and had plenty of rooms with an open exhibition space which worked nicely – the two tracks were on opposite sides of this space so the footfall for the exhibitors was hopefully good! Full marks go to the crew:

    phpbnl10 crew

    I gave my talk “Passing the Joel Test in the PHP World” with some updates since I first gave it at PHPNW09 in Manchester. This is a nice best practices talk and although I didn’t have a lot of people in my talk, this was no surprise since Ivo was speaking in the same slot as me with his “PHP and the Cloud” talk, which I STILL haven’t seen! If you are interested my slides are here: http://www.slideshare.net/lornajane/passing-the-joel-test-in-the-php-world-phpbnl10 Thanks to my audience who were great and managed to stay enthusiastic despite my nerves and the late afternoon slot :)

    Here’s to PHPBenelux 2011!

  • [Lorna Mitchell] Speaking at SuperMondays February 3, 2010

    I’m delighted to announce that the people at SuperMondays in Newcastle have invited me to speak at their event on 22nd February. For this I’ll be writing a new talk entitled “PHP and Web Services: Perfect Partners” – looking at how PHP is a good fit for web services and how I’m using it both in my day job and in my hobby projects. Visit the event page itself for the full description, a bit about me, and the arrangements for the night. I am warned that they have limited capacity so although admission is free, if you want to go you should register for tickets ASAP!

    If you are attending, let me know and come and say “hi” to me on the night! I don’t know this crowd well but so far they are pretty friendly and I’m looking forward to the trip north :)

  • [Lorna Mitchell] Stopping CodeIgniter from Escaping SQL January 29, 2010

    I’m adding some small features to the API for joind.in when I have a moment and this is my first experience of working with CodeIgniter. I’ve been getting increasingly impatient with its tendency to try to escape my SQL code for me – this is a really useful default feature but it seems to assume I don’t know what I’m doing and so it puts backticks all over perfectly acceptable SQL code, very annoying!

    One night when I was getting exasperated with it tangling up my SQL expressions, I tweeted my frustration in the hope that I was just missing something simple. A prompt reply from @damiangostomski told me that this was indeed the case … I dug around for the API docs on codeigniter – it’s an established framework and has a good reputation. I knew it would have API docs even though I hadn’t used the framework before, and I found them:

    $this->db->select() accepts an optional second parameter. If you set it to FALSE, CodeIgniter will not try to protect your field or table names with backticks. This is useful if you need a compound select statement.

    That quote is from this API docs page – so a big thankyou to Damian for replying to me on twitter, and to the good people at codeigniter for adding a useful option to their framework and documenting it so nicely :)

  • [Lorna Mitchell] Speaking at PHPNW February January 25, 2010

    If anyone is able to make it to the PHPNW User Group meet in Manchester next Tuesday 2nd February – I’m the speaker there! I’ll be giving a talk entitled “Best Practices for Web Service Design”, which covers lots of information about web services and how to write one that your users will love! Details of the event are over on upcoming, you can find out more about the talks, the venue and the group as a whole. If you’re able to make it then I’ll see you there – its a good crowd :)

  • [Ian Barber]Bayesian Opinion Mining January 23, 2010

    A bigger type of minerThe web is a great place for people to express their opinions, on just about any subject. Even the professionally opinionated, like movie reviewers, have blogs where the public can comment and respond with what they think, and there are a number of sites that deal in nothing more than this. The ability to automatically extract people’s opinions from all this raw text can be a very powerful one, and it’s a well studied area – no doubt because of the commercial possibilities.

    Opinion mining, or sentiment analysis, is far from a solved problem though. People often express more than one opinion “the movie was terrible, but DeNiro’s performance was superb, as always“, use sarcasm “this is probably the best laptop Dell could come up with“, or use negation and complex devices that can be hard to parse “not that I’m saying this was a bad experience“.

    On top of this, expressions of sentiment tend to be very topic focused – what works for one subject might not work for another. To use a well worn example, it’s a good thing to say that the plot of a movie is unpredictable, but a bad thing to say it about the steering of a car. Even within a certain product, the same words can describe opposite feeling about different features – it’s bad for the start-up time on a digital camera to be long, but it’s good for the battery life to be long. This is why a great deal of work, particularly in product reviews, is spent in classifying which element of a product is being talked about, before starting the opinion mining process.

    At the movies

    We’ll start with a simpler approach, and look at movie reviews. Luckily for us these are fairly easily available on line from places like Rotten Tomatoes and IMDB, and indeed a convenient data set of sentences expressing positive and negative opinions has already been compiled. We’re using opinions expressed on the sentence level in order to give ourselves a little more granularity – while most movie reviews are longer than this, they will also usually express more than one opinion, and keeping our document unit smaller helps us avoid muddying the waters.

    The data is supplied as two files, one for positive opinions and the other negative, with one sentence per line, which makes it easy to parse. To actually extract the opinion, we’re going to make use of a classic and well known tool, a Naive Bayesian classifier. These were all the rage for spam filters a couple of years back, and are still a hugely popular way of doing filtering. They have the advantage that they’re easy to implement, pretty effective, and quick to classify with.

    Naive Bayes

    Bayesian classifiers are based around the Bayes rule, a way of looking at conditional probabilities that allows you to flip the condition around in a convenient way. A conditional probably is a probably that event X will occur, given the evidence Y. That is normally written P(X | Y). The Bayes rule allows us to determine this probability when all we have is the probability of the opposite result, and of the two components individually: P(X | Y) = P(X)P(Y | X) / P(Y). This restatement can be very helpful when we’re trying to estimate the probability of something based on examples of it occurring.

    In this case, we’re trying to estimate the probability that a document is positive or negative, given it’s contents. We can restate that so that is in terms of the probability of that document occurring if it has been predetermined to be positive or negative. This is convenient, because we have examples of positive and negative opinions from our data set above.

    The thing that makes this a “naive” Bayesian process is that we make a big assumption about how we can calculate at the probability of the document occurring: that it is equal to the product of the probabilities of each word within it occurring. This implies that there is no link between one word and another word. This independence assumption is clearly not true: there are lots of words which occur together more frequently that either do individually, or with other words, but this convenient fiction massively simplifies things for us, and makes it straightforward to build a classifier.

    We can estimate the probability of a word occurring given a positive or negative sentiment by looking through a series of examples of positive and negative sentiments and counting how often it occurs in each class. This is what makes this supervised learning – the requirement for pre-classified examples to train on.

    So, our initial formula looks like this.

    P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)

    We can drop the dividing P(line), as it’s the same for both classes, and we just want to rank them rather than calculate a precise probability. We can use the independence assumption to let us treat P(sentence | sentiment) as the product of P( token | sentiment) across all the tokens in the sentence. So, we estimate P(token | sentiment) as

    count(this token in class) + 1 / count(all tokens in class) + count( all tokens )

    The extra 1 and count of all tokens is called ‘add one’ or Laplace smoothing, and stops a 0 finding it’s way into the multiplications. If we didn’t have it any sentence with an unseen token in it would score zero. We have implemented the above in the classify function of the following class:

    We’re implementing this in PHP in the classify function:

    <?php
    class Opinion {
            private $index = array();
            private $classes = array(‘pos’, ‘neg’);
            private $classTokCounts = array(‘pos’ => 0, ‘neg’ => 0);
            private $tokCount = 0;
            private $classDocCounts = array(‘pos’ => 0, ‘neg’ => 0);
            private $docCount = 0;
            private $prior = array(‘pos’ => 0.5, ‘neg’ => 0.5);

            public function addToIndex($file, $class, $limit = 0) {
                    $fh = fopen($file, ‘r’);
                    $i = 0;
                    if(!in_array($class, $this->classes)) {
                            echo “Invalid class specified\n;
                            return;
                    }
                    while($line = fgets($fh)) {
                            if($limit > 0 && $i > $limit) {
                                    break;
                            }
                            $i++;
                           
                            $this->docCount++;
                            $this->classDocCounts[$class]++;
                            $tokens = $this->tokenise($line);
                            foreach($tokens as $token) {
                                    if(!isset($this->index[$token][$class])) {
                                            $this->index[$token][$class] = 0;
                                    }
                                    $this->index[$token][$class]++;
                                    $this->classTokCounts[$class]++;
                                    $this->tokCount++;
                            }
                    }
                    fclose($fh);
            }
           
            public function classify($document) {
                    $this->prior[‘pos’] = $this->classDocCounts[‘pos’] / $this->docCount;
                    $this->prior[‘neg’] = $this->classDocCounts[‘neg’] / $this->docCount;
                    $tokens = $this->tokenise($document);
                    $classScores = array();

                    foreach($this->classes as $class) {
                            $classScores[$class] = 1;
                            foreach($tokens as $token) {
                                    $count = isset($this->index[$token][$class]) ?
                                            $this->index[$token][$class] : 0;

                                    $classScores[$class] *= ($count + 1) /
                                            ($this->classTokCounts[$class] + $this->tokCount);
                            }
                            $classScores[$class] = $this->prior[$class] * $classScores[$class];
                    }
                   
                    arsort($classScores);
                    return key($classScores);
            }

            private function tokenise($document) {
                    $document = strtolower($document);
                    preg_match_all(‘/\w+/’, $document, $matches);
                    return $matches[0];
            }
    }
    ?>

    The classify function starts by calculating the prior probability (the chance of it being one or the other before any tokens are looked at) based on the number of positive and negative examples – in this example that’ll always be 0.5 as we have the same amount of data for each. We then tokenise the incoming document, and for each class multiply together the likelihood of each word being seen in that class. We sort the final result, and return the highest scoring class.

    The other important method here is addToIndex. All this does is loop over the data, tokenising the documents and storing counts of the terms for later use.

    We can generate a slightly scrubby test set by not quite taking all the data, and using the remaining training examples to test with.

    <?php
    $op = new Opinion();
    $op->addToIndex(‘opinion/rt-polaritydata/rt-polarity.neg’, ‘neg’, 5000);
    $op->addToIndex(‘opinion/rt-polaritydata/rt-polarity.pos’, ‘pos’, 5000);
    $i = 0; $t = 0; $f = 0;
    $fh = fopen(‘opinion/rt-polaritydata/rt-polarity.neg’, ‘r’);
    while($line = fgets($fh)) {
            if($i++ > 5001) {
                    if($op->classify($line) == ‘neg’) {
                            $t++;
                    } else {
                            $f++;
                    }
            }
    }
    echo “Accuracy: “ . ($t / ($t+$f));
    ?>

    This gives an accuracy of around 0.8, which isn’t bad really! To demonstrate it, we can chuck a couple of example sentences in:

    <?php
    $op = new Opinion();
    $op->addToIndex(‘opinion/rt-polaritydata/rt-polarity.neg’, ‘neg’);
    $op->addToIndex(‘opinion/rt-polaritydata/rt-polarity.pos’, ‘pos’);
    $string = “Avatar had a surprisingly decent plot, and genuinely incredible special effects”;
    echo “Classifying ‘$string‘ – “ . $op->classify($string) . \n;
    $string = “Twilight was an atrocious movie, filled with stumbling, awful dialogue, and ridiculous story telling.”;
    echo “Classifying ‘$string‘ – “ . $op->classify($string) . \n;
    ?>

    Which returns as expected:

    Classifying 'Avatar had a surprisingly decent plot, and genuinely incredible special effects' - pos
    Classifying 'Twilight was an atrocious movie, filled with stumbling, awful dialogue, and ridiculous story telling.' - neg
    

    We can even use it on a longer review, as long as we split into sentences first. I grabbed the review of Avatar from The Scientific Indian.

    <?php
    // … snip … article contents as $op setup
    $sentences = explode(“.”, $doc);
    $score = array(‘pos’ => 0, ‘neg’ => 0);
    foreach($sentences as $sentence) {
            if(strlen(trim($sentence))) {
                    $class = $op->classify($sentence);
                    echo “Classifying: \” . trim($sentence) . \” as “ . $class . \n;
                    $score[$class]++;
            }
    }
    var_dump($score);
    ?>

    Just to give a snippet of the output, we get:

    Classifying: "Fortunately, the movie's moral premise plays second fiddle to the technical feats" as neg
    Classifying: "I enjoyed the movie" as pos
    Classifying: "The ending is especially poignant" as pos
    Classifying: "The visual effects are spectacular and a lot of the production techniques are a first in the craft of movie making" as pos
    Classifying: "For that alone, the movie is a must see" as pos array(2) { ["pos"]=> int(25) ["neg"]=> int(11)
    }
    

    So, broadly positive, which is the right direction!

    More Opinions

    There’s a lot we haven’t addressed in our classifier. We could pass the sentences through a couple of other classifiers first, using Bayesian techniques again, in order to determine some more useful facts. For example, is this even a review? If we just start processing blog posts, for example, we’ll find a lot that mention a movie without actually saying whether it’s good or bad, and we may as well discard those.

    Then, for each sentence, which part of the movie is it talking about? We might be able to correctly interpret a review which slams the actor, slates the script, but was impressed with the special effects. At each stage, the process would be the same as this time – find or create training data, train a classifier, and let it go to work.

    We could also look at more complicated language models and named entity extractors, that allow us to map the odd phrases that sometimes occur, and associate opinions with the appropriate parts of a sentence. This can be a lot more work, but can also lead to higher accuracy and reliability.

    Photo Credit: Grégory Tonon

  • [Lorna Mitchell] Three Ways to Make a POST Request from PHP January 18, 2010

    I’ve been doing a lot of work with services and working with them in various ways from PHP. There are a few different ways to do this, PHP has a curl extension which is useful, and if you can add PECL extensions then pecl_http is a better bet but there are a couple of different ways of using it. This post shows all these side-by-side.

    POSTing from PHP Curl

    This is pretty straightforward once you get your head around the way the PHP curl extension works, combining various flags with setopt() calls. In this example I’ve got a variable $xml which holds the XML I have prepared to send – I’m going to post the contents of that to flickr’s test method.

    $url = ‘http://api.flickr.com/services/xmlrpc/’;
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $xml);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);

    $response = curl_exec($ch);
    curl_close($ch);
     

    First we initialised the connection, then we set some options using setopt(). These tell PHP that we are making a post request, and that we are sending some data with it, supplying the data. The CURLOPT_RETURNTRANSFER flag tells curl to give us the output as the return value of curl_exec rather than outputting it. Then we make the call and close the connection – the result is in $response.

    POSTing from Pecl_Http

    Pecl_Http has two interfaces – one procedural and one object-oriented; we’ll start by looking at the former. This is even simpler than in curl, here’s the same script translated for pecl_http:

    $url = ‘http://api.flickr.com/services/xmlrpc/’;

    $response = http_post_data($url, $xml);
     

    This extension has a method to expressly post a request, and it can optionally accept data to go with it, very simple and easy.

    POSTing from Pecl_Http: the OO interface

    Finally let’s see what the OO verison of the extension looks like. Exactly the same call as both the above examples, but using the alternative interface, means our code looks like this:

    $url = ‘http://api.flickr.com/services/xmlrpc/’;

    $request = new HTTPRequest($url, HTTP_METH_POST);
    $request->setRawPostData($xml);
    $request->send();
    $response = $request->getResponseBody();
     

    This example is quite a bit longer than the previous one, and you might think this indicates that this approach is more complicated. In some senses that is true and its probably overkill for our extremely trivial example. However it is worth mentioning that the pecl_http extension is extremely flexible and powerful, and can handle some cases that the curl extension can’t. So even if it looks more complicated here, it can still be an excellent choice to implement.

    In Conclusion

    That was a very fast round-up of three ways you could make an arbitrary web service call from PHP – hopefully these examples are clear and will help anyone just starting to implement something along these lines.

  • [Lorna Mitchell] Dutch PHP Conference: Call for Papers Now Open January 12, 2010

    There is an announcement over on the DPC (Dutch PHP Conference) website – their Call for Papers is now open (so go submit!). What’s remarkable about this announcement is that I wrote it, and its signed with my name and the words “Your host this year” … yes, I’m hosting DPC.

    I’m pretty excited about this, I love getting involved with events and I also love DPC as an event, so together these are pretty special. DPC is organised by my employers, Ibuildings – so I actually get paid to get involved with this conference, which is pretty cool :) The submissions have already started coming in to the call for papers and the quality and variety of the talks, from people I know well and others I’ve never heard of, is staggering. I’m hoping that this trend continues right through until the CfP closes on 31st January. The task of choosing the tasks will be very difficult but we have a panel of selectors ready to step up to the challenge – and I’m already excited about how good this year’s event is going to be!

  • [Lorna Mitchell·X] Speaking at TEK·X January 7, 2010

    I’m always pleased to be accepted as a speaker but I’m especially delighted to hear that I’m speaking at TEK·X in Chicago this May. They had a crazy number of submissions for the number of slots available, and I really wanted to go since I spoke there last year and enjoyed the event hugely! This year I’m giving the following sessions:

    PHP Best Practices (tutorial) – This is a half-day tutorial with my good friend Matthew Weier O’Phinney covering all sorts of good stuff you can do when you develop PHP. Its a general session and the aim is that everyone in the room takes away something new from our tips and tricks (and stories of what *NOT* to do!)

    SVN in a Distributed World I’m giving this talk for the first time, looking at how traditional source control (subversion) compares with the newer distributed version control solutions (git, bzr). There’s been lots of buzz around git but in the PHP world we choose our tools on merit, not on cool factor, so this is a chance for me to share my experiences with both types of systems and talk a bit about which scenarios the various tools are a good fit for.

    Open Source Your Career Another new talk! This one is about how much personal gain there is being an open source contributor. I’ve grown hugely, both personally and professionally, from my experience with user groups, events, and software in the open source space – so I’ll be sharing some tips on how things can work out well all round.

    If you’re going to the conference, then do make sure to stop me and say “hi” – there are so many people at these events that sometimes I miss out on meeting people I’d like to have spoken to. You can’t miss me, I’m the woman with the English accent and curly hair!! I had an absolutely great time last year and I’m already looking forward to this year’s conference!

  • [Ian Barber]PageRank In PHP December 16, 2009

    Links!Google was a better search engine than it’s predecessors for a number of reasons, but probably the most well known one is PageRank, the algorithm for measuring the importance of a page based on what links to it. Though not necessarily that useful on its own, this kind of link analysis can be very helpful as part of a general information retrieval system, or when looking at any kind of network, such as a friend graph from a social network.

    Larry Page came up with PageRank as a way of measuring the importance of a page while at Stanford in the mid nineties. The idea is that there is a random surfer, who starts on web page then browses through in a somewhat random way. For each page the surfer visits he randomly chooses one of the links from that page and follows it. From time to time though he gets bored, or hits a page with no links on it, and so jumps to some different page and starts the process again. The PageRank of a page is the proportion of his time that the surfer spends on the page.

    This random walk is a Markov chain, a traversal of a matrix that gives the probability of moving to any state from any other state, with the total probabilities for the destinations from any given page adding up to 1. PageRank attemptes to find extract a vector of weights from that table, with one weight per page – they aren’t related to any particularly query, so are only ever going to be a factor in a page being returned on a results page. However, all else being equal, a page with higher PageRank should be a better match than a page with lower PageRank.

    As an aside, looking at links is also a convenient for one of the other key advances Google made – indexing pages based on the text of anchors pointing towards the page. This helps in cases where a term that the page might not use itself is searched for – for example the phrase ‘big blue’ for IBM is unlikely to be on their site, but will be in some links to that site. It also helps for resources which are hard to index themselves, such as videos and images, meaning they can be included in the results.

    The general idea then of PageRank is pretty simply – a page hands out its PageRank to the pages it links to – but it leads to a chicken and egg situation. At the start of the process the page has no PageRank to hand out, just whatever it was initialised to. This means that the algorithm must be an iterative process, which converges on the true values.

    Each page is initially given 1/the total number of pages of PageRank. At each step this is handed out to all the pages that it links to, and the base PageRank for each page replaced with the total amount it received, minus the damping factor to reflect the possibility of our random surfing typing a new URL in. We loop this until the PageRank is pretty stable, which is normally in the region of 30-50 iterations.

    If we assume that we have already done our crawling and parsing steps, we will be left with an index of document IDs for different pages, and the document IDs of the pages that they link to. We don’t necessarily need the whole link graph in memory, but for the example we’ll assume that it does fit – this is patently not true for truly webscale items where we are likely to be talking billions or trillions of links.

    <?php
    $links = array(
            1 => array(5),
            2 => array(4, 7, 8),
            3 => array(1, 3, 4, 7, 9),
            4 => array(1, 2, 4, 8),
            5 => array(1, 6, 7, 9),
            6 => array(1, 5, 8),
            8 => array(3, 4),
            9 => array(1, 4, 6, 8)
    );
    ?>

    If we look at this link graph, a few things stand out. Page 1 is pointed to by almost everything, so is presumably very good. Page 4 is also very popular, so should be expected to rank well.

    We can then calculate the PageRank by running our iterative process. We’ve got a hard limit of 100 iterations, but we also keep track of the amount of change between one set of PageRanks and the next. If that change drops below a set level (here 0.00005) the process decides it’s stable enough and stops.

    <?php
    function calculatePageRank($linkGraph, $dampingFactor = 0.15) {
            $pageRank = array();
            $tempRank = array();
            $nodeCount = count($linkGraph);

            // initialise the PR as 1/n
            foreach($linkGraph as $node => $outbound) {
                    $pageRank[$node] = 1/$nodeCount;
                    $tempRank[$node] = 0;
            }

            $change = 1;
            $i = 0;
            while($change > 0.00005 && $i < 100) {
                    $change = 0;
                    $i++;

                    // distribute the PR of each page
                    foreach($linkGraph as $node => $outbound) {
                            $outboundCount = count($outbound);
                            foreach($outbound as $link) {
                                    $tempRank[$link] += $pageRank[$node] / $outboundCount;
                            }
                    }
                   
                    $total = 0;
                    // calculate the new PR using the damping factor
                    foreach($linkGraph as $node => $outbound) {
                            $tempRank[$node]  = ($dampingFactor / $nodeCount)
                                                    + (1-$dampingFactor) * $tempRank[$node];
                            $change += abs($pageRank[$node] - $tempRank[$node]);
                            $pageRank[$node] = $tempRank[$node];
                            $tempRank[$node] = 0;
                            $total += $pageRank[$node];
                    }

                    // Normalise the page ranks so it’s all a proportion 0-1
                    foreach($pageRank as $node => $score) {
                            $pageRank[$node] /= $total;
                    }
            }
           
            return $pageRank;
    }
    ?>

    The results we get are pretty much as we expect:

    <?php
    array(8) {
      [1]=>
      float(0.17084756816354)
      [2]=>
      float(0.060138597093041)
      [3]=>
      float(0.095270626028832)
      [4]=>
      float(0.17308244947008)
      [5]=>
      float(0.20422266588834)
      [6]=>
      float(0.086831014221959)
      [8]=>
      float(0.12476184607085)
      [9]=>
      float(0.084845233063364)
    }
    ?>

    The three main pages are 1 and 4, as we expected, but page 5 also gets a great rank because it is the only thing linked to by page 1.

    We can apply this algorithm to any graph where the links between nodes have a direction from one to the other, such as determining important users on twitter by looking at @references between them, for example. Once we start calculating this data it’s easy to start looking for different types of nodes, like hubs (outbound links to mostly pages with high PageRank) and authorities (lots of inbound links from pages with high PageRank), or calculating PageRank within certain topic groupings, all of which can contribute to results ranking, and a more effective search.

  • [Ian Barber]Speaking at PHPUK and Sogeti December 11, 2009

    For anyone interested in such matters, I’ll be speaking at a couple of events in February next year which could provide valuable Learning Experiences for your continual professional development! First up on the 6th of February near Utrecht in the Netherlands is Sogeti Engineering World, where I’ll be talking about document classification in PHP. This includes discussions of why you might want to classify things, why you’d do it in PHP, and then looks at algorithms including a Bayesian classifier, K-Nearest Neighbour and ID3 decision trees. There’s some great talks on the schedule, including my main man Felix de Vliegher talking about gearman, and a whole bunch of other interesting stuff, some of which may contain Dutch. I’m quite looking forward to “Object Oriented Cobol” personally. After that I’m speaking at the always lovely PHP UK Conference, in sunny London on the 26th of February. There I’ll be talking integrating search engines into your applications, and we’ll cover a bunch of interesting bits and pieces on how search works, and how you can provide better results than anyone (on your own site at least). The line up is top notch, with gurus abounding, including Lorenzo “Hollywood” Alberton talking about doing trendy things with databases, Lorna talking about webservices, the leads from Zend Framework and Symfony not talking about their frameworks, JRF talking about regex, Derick talking about D-bus, Stefan talking about patterns and Skoop talking about documentation.

  • [Lorna Mitchell] Speaking at PHPUK December 11, 2009

    I’m pleased to announce that this year I’ll be speaking at PHPUK in London in February. I’ve attended this conference for the last three years, and attend its related user group, PHP London whenever I can find a reason to be in London on the right day. My talk this time is a brand new one, “Best Practices for Web Service Design”, which covers the main points (and pitfalls!) of architecting a web service to be as robust and useful as possible. This is something I’ve been doing quite a bit of in my day job lately and I’m hoping to pass on some of what I’ve learned.

    This conference is well-established and I’ve had a blast most years I’ve attended! Although their schedule isn’t public yet (it will be soon), the other sessions I’ve heard about on the grapevine sound good. If you want to attend, the date is Friday 26th February and you can buy your tickets on their site. Let me know if I will see you there :)

  • [Ian Barber]Text Generation December 9, 2009

    A generatorAfter a rather technical post last week, something a bit lighter. Text and language generation is a fun topic with applications that run from randomly generating scientific papers for conferences, to the practical tasks of generating speech and automated responses. In this post we’ll look at how we can generate some nonsense text based on existing documents, which isn’t on the overly practical side, though it can make a fun change from Lorem Ipsum for holding copy. The code is throughout, but you can also grab the lot in a zip. The basic idea is extract a set of probabilities of certain words appearing, and then generate a document based on this probability model. We’ll be using pairs of words as our statistical unit of choice, making this a two-word language model. This is really using Markov chain by modelling, for any given word in the text, what the probabilities are of moving to another word. You can imagine a Markov chain as a big table of states. By reading down each column you can see, for the state represented by that column, what the probability is of moving to any of the other states. This means that if we generate text based on walking through this table then much like the real word it’s hard to predict the exact outcome, but easy to predict the statistical properties. The way we do this is really pretty simple: tokenise the text, then keep a count for each pair of tokens. For each word, we normalise the counts of the words that follow it to get a probability of moving from the first word to each of the second words. For generating the text itself we just choose a random first word, get the list of paired words and choose a random second word. We keep outputting them until we hit our word limit. In both these cases “random” means a random point 0-1, which we look up in the distribution to find the word – so more common words are more likely to come up.

    PHP Text Generator

    The class below does exactly this. When given a file to generate text based from, as an argument to the learn function, it first tokenises the file then stores pairs in an array of [first word][second word]. Each pair is associated with a count, which is then normalised between 0 and 1 for each set – so for any given word we have the probability of moving to each of it’s possible pairings, with the possibility of going to any words not seen paired implicitly being zero. This is accompanied by an array, rootScores, for all the first words – so that for the start of the algorithm or if we hit a sentence ender like ‘.’, we can choose a new word to start generating from. The generate function performs the random walk through the probabilities. At each step it tries to pick a word from the pairs for the current word, and it tries to ensure that that word isn’t the same as the one it started with, to avoid very small loops. If the current word doesn’t have any pairs, or isn’t set (in the case of the first word or sentence enders), then it picks a starting word from rootScores. The actual picking is done by the pick function, which just generates a random float between 0 and 1 and returns the corresponding item from the passed array.

    <?php
    class LangGen {
            protected $model = array();
            protected $rootScores = array();
            protected $sentenceEnd = array(‘.’, ‘!’, ‘?’);
            protected $joinSentence = array(‘,’, ‘:’, ‘;’);

            public function learn($filePath) {
                    $contents = strip_tags(file_get_contents($filePath));
                    $tokens = $this->tokenise($contents);
                    unset($contents);
                   
                    $prevToken = null;
                    foreach($tokens as $token) {
                            if($prevToken) {
                                    if(!isset($this->model[$prevToken])) {
                                            $this->model[$prevToken] = array();
                                    }
                                    if(!isset($this->model[$prevToken][$token])) {
                                            $this->model[$prevToken][$token] = 0;
                                    }
                                    $this->model[$prevToken][$token]++;
                            }
                            $prevToken = $token;
                           
                            // handle sentence enders
                            if(in_array($token, $this->sentenceEnd)) {
                                    $prevToken = null;
                            } else {
                                    if(!isset($this->rootScores[$token])) {
                                            $this->rootScores[$token] = 0;
                                    }
                                    $this->rootScores[$token]++;
                            }
                    }
                    unset($tokens);
                   
                    // normalise probabilities
                    foreach($this->model as $key => $tokens) {
                            $this->model[$key] = $this->probNormalise($tokens);
                    }
                    $this->rootScores = $this->probNormalise($this->rootScores);
            }
           
            public function generate($length = 100) {
                    $word = null;  
                    for($i = 0; $i < $length; $i++) {
                            if(is_array($this->model[$word])) {
                                    do {
                                            $return[$i] = $this->pick($this->model[$word]);
                                    } while($word == $return[$i]);
                                    $word = $return[$i];
                            } else {
                                    $return[$i] = $word = $this->pick($this->rootScores);
                            }
                    }
                    return $this->generateString($return);
            }      
           
            protected function generateString(array $words) {
                    $words[0] = ucwords($words[0]);
                    foreach($words as $key => $word) {
                            if(in_array($word, $this->sentenceEnd)) {
                                    $words[$key-1] .= $word;
                                    unset($words[$key]);
                                    $words[$key+1] = ucwords($words[$key+1]);
                            } else if(in_array($word, $this->joinSentence)) {
                                    if(strlen($words[$key-1])) {
                                            $words[$key-1] .= $word;
                                    }
                                    unset($words[$key]);
                            }
                    }
                    return implode(‘ ‘, $words);
            }
           
            protected function probNormalise($array) {
                    $total = array_sum($array);
                    $runningScore = 0;
                    foreach($array as $key => $score) {
                            $runningScore += ($score/$total);
                            $array[$key] = $runningScore;
                    }
                    return $array;
            }

            protected function pick($array) {
                    $floatRand = rand(0, 1000000) / 1000000.0;
                    foreach($array as $key => $value) {
                            if($floatRand < $value) {
                                    return $key;
                            }
                    }
            }

            protected function tokenise($string) {
                    preg_match_all(“/[\'|\w]+|[\:|\;|\.|\?|\!|\,]/”, $string, $matches);
                    foreach($matches[0] as $id => $match) {
                            if(is_numeric($match)) {
                                    unset($matches[0][$id]);
                            } else {
                                    $matches[0][$id] = strtolower($match);
                            }
                    }
                    return $matches[0];
            }
    }
    ?>

    The tokeniser is a bit different from previous ones, as we want to specifically separate out punctuation – we don’t really have to do this as actually leaving punctuation intact gives a very nice bit of text generation, but it’s interesting to view the relationship between these different types of punctuation and the words of the text. For a change we’ve used a much longer example text, Jane Austen’s Pride & Prejudice from Project Gutenberg.

    <?php
    $langGen = new LangGen();
    $langGen->learn(‘1342.txt’);
    echo $langGen->generate();
    ?>

    “Hurst, when at her bracelets and when, was there an opportunity. Of admiration, lizzy, to see a charming, but she could be as politely by the reason for exertion of their own happiness overflows in his wishing them, gave them. Has been presented? I may depend on the wedding need not immediately on you have employment, in time before you out to stay, we are much of the preference of her desire of the hearth, which will keep winking at hunsford between them to make her brother.”

    However, you can get entertaining results on much smaller blocks of text, such as on websites. For example, from a page of the excellent Eloquent Javascript site.

    <?php
    $langGen = new LangGen();
    $langGen->learn(‘http://eloquentjavascript.net/chapter6.html’);
    echo $langGen->generate();
    ?>

    Stroustrup is often useful. So on three values, map takes two hours in the biggest kind of discarding the introduction to successful html documents, you have read like this to say ‘5 10′ in the syntax here is chosen. That strings creates a function. The computer programming, b; else is the code. Href: footnote number footnote var footnotes are dry kind of a big strings. Function. Look embarrassingly amateurish. P: var paragraphs are getting entirely. The expressions inside the secret to baffle us enough.

    Grammar

    One thing we aren’t considering is any form of grammar. The simplest way we could approach this would be just to add the part of speech to each word and run the same process – we’d want a larger body of text, but it might lend something to the process. We’ll be using the PoS tagger from an earlier post, but because there are a couple of minor modifications it’s included in this zip of the code .

    <?php
    class PosLangGen extends LangGen {
            private $tagger;
           
            public function __construct($lexicon = ‘lexicon.txt’) {
                    $this->tagger = new PosTagger($lexicon);
            }
           
            protected function tokenise($contents) {
                    $tokens = parent::tokenise($contents);
                    $tags = $this->tagger->tag($tokens);
                    foreach($tokens as $i => $token) {
                            $return[] = $token . “/” . $tags[$i];
                    }
                    unset($tokens);
                    unset($tags);
                    return $return;
            }
           
            protected function generateString(array $words) {
                    foreach($words as $key => $word) {
                            list($word, $tag) = explode(“/”, $word);
                            $words[$key] = $word;
                    }
                    return parent::generateString($words);
            }
    }
    ?>

    “Was. Darcy? Not my share of books were not long wished to inform us. On mrs. Lady catherine, by no means satisfy her friends than lovely and see them with mutual civility that these are conditions which often told me laugh at such a subject, indeed had passed some delicacy restrained her younger sisters, you for a colder voice whether she felt that is impossible for exposing herself. She then accompanied her. Tell you must not to another man from something, said elizabeth was out very soon began to make”

    Send Mr Change for observations

    The other approach we could take would be to try and extra a grammar model from the text, then generate a sequence of tags based on that model. We could then look up which words should fill each tag based on their own probabilities. This is actually another pretty easy tweak on top of the code we already have.

    <?php
    class PosGramGen extends LangGen {
            private $tagger;
            private $types;
           
            public function __construct($lexicon = ‘lexicon.txt’) {
                    $this->tagger = new PosTagger($lexicon);
            }
           
            protected function tokenise($contents) {
                    $tokens = parent::tokenise($contents);
                    unset($contents);
                    $tags = $this->tagger->tag($tokens);
                    foreach($tags as $i => $tag) {
                            if(!isset($this->types[$tag])) {
                                    $this->types[$tag] = array();
                            }
                            if(!isset($this->types[$tag][$tokens[$i]])) {
                                    $this->types[$tag][$tokens[$i]] = 0;
                            }
                           
                            $this->types[$tag][$tokens[$i]]++;
                            $return[] = $tag;
                    }
                    unset($tokens);
                    unset($tags);
                    foreach($this->types as $key => $types) {
                            $this->types[$key] = $this->probNormalise($types);
                    }
                    return $return;
            }
           
            protected function generateString(array $words) {
                    foreach($words as $key => $tag) {
                            $words[$key] = $this->pick($this->types[$tag]);
                    }
                    return parent::generateString($words);
            }
    }
    ?>

    “Compatible palings air you is moreover, laughed passed to pass. Of wickham, what rest jane; whether feelings; and you was, had chiefly though love tenants latter mr, to send mr change for observations of the marriage of your darcy’s pressed to the one with i if her half been contrary husband have satisfied types, who it may so to it but wickham employment of sentiment, mr side without either one obliged it pretty feel and their much spirits education me is at rather of sister jane. She; and her”

    I couldn’t say it’s any better than the others, but with some work on the word choice logic you may be able to actually get it somewhere in the region of making sense, but whether that’s actually worth doing is entirely down to your own sensibilities.

  • [Vincenzo Russo] Programmatic CCK Content Types: why my way? December 7, 2009

    A few days ago, I have announced the Alternate Content Copy module, which hopefully is going to be accepted on the Drupal.org project repository (as soon as I submit it).

    In that article I explained the general reasons why those modifications should be part of the official Content Copy module.

    Now, I do realise that Fields in D7 might be so different that everything I am about to say could not make any sense in the near future. However, I would like, with this article, to explain why the programmatic update of CCK content types does benefit of my modifications.

    Programmatic CCK content types update: Markus’ way

    In another post I explained how to get programmatic CCK content types done, including a programmatic update. My way of doing this last bit only works if the Alternate Content Copy module is used.

    Markus_petrus claims that my way is not the safest and the correct one to go about that. Instead, in the hook_update_N() the CCK CRUD APIs should be used there where the Schema APIs would be used for regular Drupal content types.

    Basically, the procedure suggested my markus_pretrus can be summarised as follow:

    • Update the content type using the UI to reconfigure the structure and export the new definition to code (the .def.inc file in my example);
    • Write a new hook_update_N using CCK CRUD API to resemble the new definition.

    At this point both a fresh install of the module or an update of it should take us to the same version of the content type.

    This is fine and it resembles exactly what one would be doing for a regular content type:

    • Update the content type schema within the hook_install;
    • Write a new hook_update_N to resemble the changes made to the schema definition at the previous point;

    My way: less error prone

    By using my solution to implement CCK content types update, we can improve things in two ways.

    Firstly, we make the updates less error prone, as both the hook_install and the various hook_update_N would be relying on the same content type definition, and there is no way the update procedure and the installation procedure could end up with a different result. There is no CRUD API (directly) involved, so no human error can occur while writing them into the hook_update_N that could make an update be different from a fresh installation.

    Secondly, we make the job easier for the developers, because everything they need to do is to export the new definition to code and then add a new hook_update_N that will have a standard format, i.e. all the hook_udpate_Ns will look the same.

    CRUD APIs vs Alternative Content Copy. Really?

    I have been writing this article so far because of a supposed divergence of opinions about how to implement CCK content types updates programmatically. Well, whether you believe it or not, that is all an imagination.

    My modifications to the import procedure you can find in the Alternate Content Copy module make exactly use of CCK CRUD APIs. Also, I have been comparing my solution to the CCK_sync module. At the of the day, we do exactly the same thing, but the latter has been referred to as one possible way to go for those who require a proper update of the whole content type.

    The only real difference between CCK_sync and Alternate Content Copy is that the former provides the functionality as alternative functions to be called via code and/or a separate interface for “synchronising” the content types via UI. My solution, instead, alters the already existing functions and APIs, integrating into CCK seamlessly.

    I still believe my approach is all right and I would dare to say even suggested, as the import procedure of CCK Content Copy should provide a way to perform a complete update of a content type, that is erroneously called synchronisation by the CCK_sync module.

    Nevertheless, my full update functionality is provided on top of the already existing functionalities in Content Copy, which means you will still be able to create a brand new content type (of course) and also to import just new fields in a pre-existing content type.

  • [Lorna Mitchell] PHP Advent 2009 December 4, 2009

    I’m very proud to be able to say “I’m a PHP Advent author” – I’ve been invited to take part in this year’s event and my article One Step at a Time is now live!

    My post this year is aimed as a reminder to us all that we can all aspire to better things, and lots of “better” eventually adds up to “pretty damn good”! If you read the post and have comments, add them here – and if you’ve chosen what one thing you’d like to change next, I’d be delighted to hear it. Whatever your next step, good luck :)

  • [Vincenzo Russo– Updated] Programmatic CCK Content Types – Updated December 3, 2009

    Not long ago, I published a HowTo for creating CCK Content Types programmatically, including the ability of updating them via code. After some events, the HowTo needed an update, as it now (necessarily) depends on the Alternate Content Copy module.

    Step 0: pick up a name and set up your module

    In this post I will be using example_cck_content as name for both the module and the content type. So, let’s create a folder called example_cck_content and underneath that the following files:

    • example_cck_content.module
    • example_cck_content.install
    • example_cck_content.info
    • example_cck_content.def.inc

    The first three files are common to almost every Drupal module. The last file will host the exported content type.

    Step 1: create your content type using CCK

    To create a new CCK content type go to: Administer » Content management » Content types » Add content type. For more information about this, read Getting started with CCK.

    Step 2: export your newly created content type

    To export the content type we just created, let’s go to Administer » Content management » Content types » Export. Follow the wizard instructions until you get the screen with the code exported (read more details about the steps in this wizard in the Step 2 of this post). The code exported will look like the following

    $content['type'] = array ( 'name' => 'Example CCK Content', 'type' => 'example_cck_content', // ... cut code ... $content['fields'] = array ( 0 => array ( 'label' => 'Box Image', 'field_name' => 'field_boximage', // ... cut code ... $content['extra'] = array ( 'title' => '-5', 'revision_information' => '20', 'comment_settings' => '30', 'menu' => '-2', );
    

    Step 3: paste the code in the module

    Let’s go back to the module we have created at the Step 0 and open the file example_cck_content.def.inc. In this file we will create a stub function called

    function _example_cck_content_cck_export() { // code of the exported content type goes here return $content;
    }
    

    Now, let’s go back to our Drupal site and copy the code from the screen we have got at the Step 2. Then, let’s paste it in place of the comment I put in the stub function above.

    Step 4: write the content type create/update function

    The function we are about to create will be invoked from both the hook_install and the hook_update_N. It invokes some CCK APIs in order to actually create/update the content type structure into the database.

    So, let’s open the example_cck_content.install and let’s write down the following function

    function _example_cck_content_save_cck_node() { module_load_include('inc', 'example_cck_content', 'example_cck_content.def'); $content = _example_cck_content_cck_export(); // we do not want too many modules enabled - the content_copy and // alternate_content_copy modules are just needed in order to install the // content type, so we just require them here (require_once prevent to // include it more than once in case it is already enabled) require_once './' . drupal_get_path('module', 'content') . '/modules/content_copy/content_copy.module'; require_once './' . drupal_get_path('module', 'alternate_content_copy') . '/alternate_content_copy.module'; alternate_content_copy_import_content($content);
    }
    

    Step 5: install, uninstall and update_N hooks

    Here it comes the interesting part. Like in the classical way, we can implement the hook_install to actually create the exported content type into the database, then implement the hook_uninstall to remove the content type when the module is uninstalled. Plus, a nice addition that other tutorials did not write in the past: the possibility of implementing the hook_update_N. The latter is very important to amend the structure of a content type when this is already on production.

    So, let’s keep open the example_cck_content.install and write down the following.

    /** * Implementation of hook_install */
    function example_cck_content_install() { _example_cck_content_save_cck_node();
    } /** * Implementation of hook_uninstall */
    function example_cck_content_uninstall() { // the type_name must be the type_name // as specified in the .def.inc file node_type_delete('example_cck_content'); menu_rebuild();
    } // EXAMPLE hook_update_N
    // every hook_update_N will look the same
    // and you will need to write a new one
    // every time you update the .def.inc file //function example_cck_content_update_1() {
    // _example_cck_content_save_cck_node();
    // return array();
    //}
    

    The very interesting bit is about the hook_update_N. Unlike the classical way, to amend a content type we will be using the CCK interface, by going to Administer » Content management » Content types and then clicking the Edit for our content type. Once done, we will be exporting it again following the Steps 2 and 3. Finally, we will add a hook_update_N exactly the way it is showed in the commented code above. The hook_update_N will all look the same.

    Step 6: handling dependencies

    One mandatory dependency is the content module, also known as CCK. Plus, depending on which kind of fields you have added to your content type, you might need to make your module dependent on one or more CCK fields module (e.g. filefield, etc.).

    In order to do so, we need to specify the dependencies in our example_cck_content.info file:

    ; $Id$
    name = Example CCK Content
    description = Provide an example CCK content type
    dependencies[] = content
    dependencies[] = filefield
    dependencies[] = imagefield
    dependencies[] = text
    core = 6.x

    Those are the actual dependencies of the module I created (and that you can download at the bottom of this post).

    However, this is not going to work properly (hopefully, until Drupal 7). In fact, since we need those dependencies enabled and installed before our hook_install will be fired, but there is currently no mechanism that ensures that hook_install is run for dependencies before the dependent modules, the installation of our content type might fail if it depends on one or more currently disabled modules.

    In order to enforce a stronger dependencies handling, we can only rely on the hook_requirements for the time being. This hook implementation must reside in the example_cck_content.install.

    /**
    * Implementation of hook_requirements()
    */
    function example_cck_content_requirements($phase) { $requirements = array(); $t = get_t(); // an array of the dependencies // where the array key is the module machine-readable name // and the value is the module human-readable name $dependencies = array( 'content' => 'Content', 'filefield' => 'FileField', 'imagefield' => 'ImageField', 'text' => 'Text', ); switch ($phase) { case 'install': $error = FALSE; $value = ''; foreach ($dependencies as $dependency => $module_nice_name) { if (!module_exists($dependency)) { $error = TRUE; $value .= $t($module_nice_name . " to be pre-installed; "); $severity = REQUIREMENT_ERROR; } } if ($error) { $requirements['example_cck_content'] = array( 'title' => $t('Example CCK Content requires: '), 'value' => $value . $t(' if the required modules are now installed, please enable this module again.'), 'severity' => $severity, ); } break; } return $requirements;
    }
    

    Step 7: download and test

    You can download the example module I created following this procedure and test that everything works as described. Enjoy!

  • [Vincenzo Russo– Alternate Content Copy] CCK – Alternate Content Copy December 3, 2009

    Eventually, markus_petrus did not want to change the current behaviour of the CCK Content Copy module. Therefore, out of box, it won’t allow you to use my way to get CCK content type done programmatically (well, not including the update of them). But let’s forget about this for now. Let’s focus on CCK and the Content Copy module.

    Content Copy import form

    From the image above, we can see the standard Content Copy import form. At a first sight, you understand that you can either create a brand new content type out of the code you are going to paste in the textarea or you can import the pasted content type into an existing one.

    Or at least, that is what you believe you can do. If you read carefully

    This form will import field definitions exported from another content type or another database.
    Note that fields cannot be duplicated within the same content type, so imported fields will be added only if they do not already exist in the selected type.

    It imports fields. Not content types. What? Why is that? Everything, from the module name, to the menu items, the URL, the breadcrumb, suggests you are about to import a content type.

    Unfortunately that is only true if you are about to create a brand new content type out of the code pasted into the form.

    We’ve got problems

    So we have several problems with the Content Copy import.

    First, the interface is not very good, because you can misunderstand what it really does.

    Second, if the modules import just fields, than it should also export just fields, and ask the users for the content type details when they require a new content type to be created to import the fields into it. But it does not. It works exporting and importing a full content type definition, which is an inconsistence. So, this module actually exports and imports either just the fields or the whole content type. It depends on your choice. Well, that’s wrong.

    Third, we actually have a missing functionality, because we can:

    • Create a new content type out of the code;
    • Import new fields into an existing content type;

    but we can’t

    • Update a content type.

    But what does “update” really mean? It means you can take an existing content type and import a new definition into it. This implies adding, updating and removing fields, as well as updating the basic informations about the content type, or the fields associations with potential field groups.

    This is clearly something that Content Copy does not do. That’s why I created Alternate Content Copy.

    Alternate Content Copy

    Alternate Content Copy import form

    Alternate Content Copy hooks into the Content Copy module, to alter its behaviour. Above you can see the altered interface of the import form. Now it hopefully communicates correctly what it actually does. All the original functionalities are left untouched, I only added a fully functional content type update, i.e. adding, updating and removing fields, as well as updating the basic informations about the content type, or the fields associations with potential field groups.

    Even though I am still wondering whether or not I should add a separate entry in the drop-down that says Import the entire content type as update, if you now select the item Import the entire content type, two things can happen:

    [...] if this does not exist in the installation, it will be created. On the contrary, if it does exist, it will be replaced with the new version.

    And that would be all for now. I will talk about creating CCK content types programmatically again in the next post, in the light of the this new module I wrote.

    Download

    The module can be downloaded at the following link. It will be soon submitted to the Drupal.org project repository. The version numbering follows the one from the Content Copy module. The beta status is due to the lack of extensive testing. Apart from that, everything looks stable and fine.

    Note: There is a file embedded within this post, please visit this post to download the file.

  • [Lorna Mitchell] PHPWomen Calendar 2010 December 2, 2009

    I’m halfway delighted and halfway cringing to announce that the phpwomen calendar is now on sale. This was a project organised by my friend and colleague Johanna Cherry, who saw an opportunity at php|tek 2009 to photograph the majority of the core PHPWomen members all in one place and turn it into a fundraising calendar.

    If you’re expecting something “Calendar Girls” then you’ll be disappointed. We are all clothed in the pictures!

    I won’t share photos from the calendar itself, if you want to see those you can buy your own, but perhaps to give you a hint I’ll share an outtake of myself:

    I must confess that I was rather agitated when the photos were taken – as a woman in a male-dominated industry, the risk of being seen as just my physical appearance is ever-present, and I normally try hard at unremarkable, unrevealing clothes with very little makeup and a pair of jeans. Hanging out in the lobby at the hotel during a technical conference in that dress and those shoes was significantly more terrifying than delivering three sessions during my first trip to the US as a speaker (which, considering the problems I have with speaking nerves, is saying something). Even after I saw the photos I was kind of unhappy with the whole experience, although I loved the outtake linked above!

    Fast forward 6 months and I had dinner with Derick Rethans, who took the photos in the calendar and arranged the printing, and he showed me the prototype he’d had printed. As I sat and turned the pages, I started to understand why this is so important. The women in these photos are some of the leading lights in the community – respected developers, some of them core developers, key community people, and speakers. Yet I saw them as the women they are … and suddenly remembered that actually, it’s acceptable to be both smart AND beautiful.

    So – get your calendar and remember all year that beauties can also be geeks! 10% of every purchase goes to PHPWomen, and we will use those funds to support our women and grow more leading lights like these.

  • [Ian Barber]Support Vector Machines In PHP December 2, 2009

    1432654927_49b9cefffc_m.jpgWhen it comes to classification, and machine learning in general, at the head of the pack there’s often a Support Vector Machine based method. In this post we’ll look at what SVMs do and how they work, and as usual there’s a some example code. However, even a simple PHP only SVM implementation is a little bit long, so this time the complete source is available separately in a zip file. The classification problem is something we’ve discussed before, but in general is about learning what separates two sets of examples, and based on that then correctly putting unseen examples into one of the other set. An example could be a spam filter, where, given a training set of spam and non-spam mails it is expected to classify email as either spam or not spam. SVMs are systems for doing exactly that, but they only care about points in spaces, rather than emails or documents. To this end a vector space style model is used to give each word in a document an ID (the dimension) and a weight based on it’s importance to the document (the document’s position in that dimension). Given a series of examples of documents in a vector space, each tagged with a ‘class’ or ‘not class’ designation, the support vector machine tries to find the line that best divides these two classes, and then classifies unseen documents based on which side of the line they appear.

    Linearly Separable Classification

    Some linearly separable points

    It’s easy to visually separate the two classes above because the problem is in two dimensions. Once we extend this to thousands of dimensions it becomes trickier for a human, but there are a whole bunch of algorithms that will cleanly separate the training cases by defining a line which divides the two sets – but what are makes one solution better than another? The intuition that drives SVMs is that the line which best divides the two cases is the one that has the largest margin between it and the nearest training examples on either side. Therefore, the important example vectors are the ones that define that margin – the ones closest to the dividing line. These are the support vectors, and it is a combination of them that provides the decision (class or not class) function for an SVM classifier. The function in our example code looks like this:

    <?php
    protected function classify($rowID) {
            $score = 0;
            foreach($this->lagrangeMults as $key => $value) {
                    if($value > 0) {  
                            $score += $value * $this->targets[$key] * $this->kernel($rowID, $key);
                    }
            }
            return $score - $this->bias;
    }
    ?>

    The judgement would then be made whether the score was positive, or negative, which represents which side of the line the document in question was on. Each of the training examples which supports the dividing line (or more accurately the dividing hyperplane) has a value in the array lagrangeMults which effectively defines it’s weight. The targets array contains which class (represented by +1 or -1) the example was in. The kernel function for now can be thought of as just the dot product between the two vectors.

    The points with a dividing plane

    The final line of the classifier adjusts for a bias value. In a 2D case, a line can be defined as y=ax+b – the b giving you the y intercept at x=0. In the same way a hyperplane has a b (using wTx + b, where w is a vector normal, or perpendicular, to the hyperplane) that specified an offset (so that it does not have to go through the origin), represented by bias. The T stands for Transpose, and is basically taking the dot products between the two vectors. Effectively we’re trying to learn the weights of the different examples, discarding all the 0 weighted ones (which therefore aren’t support vectors), and calculating the bias. Though not amazingly hard in practice, but does require a bit of maths. If we take our vector w, which is normal to the hyper plane, then we can say that any the closest point to the hyperplane is x – yr(w/|w|). So that’s saying that the point is vector x minus y (which just changes the sign) times some multiplier r, times the unit version of vector w. The unit version of w is just w divided by it’s length, the length denoted by pipe characters around the vector. We know that this value must lie on the decision hyperplane, so we can substitute it into the hyperplane equation, then shuffle it around to solve for r: r = y * (wTx + b / |w|) With this then, we can calculate the distance to the boundary for any given example vector, and calculate the margin on our classifier. If we use normalised (length 1) vectors, we can then drop the division and stick with r = y * (wTx + b). We want to make sure there is a margin, so we can define a constraint that r >= 1. We then have an optimisation problem where we have an objective function – that the margin is maximised, and a constraint, r >= 1. The way this problem is solved is by breaking down the problem. Firstly, maths optimisation problems are always minimisations rather than maximisations, and we can flip this one around by saying we wish to minimise 1/2 wTw. We then specify the constraints as a series of multipliers in the equation, which gives us an equation that the w = ∑ alphaiyixi. ∑ means foreach i in this case, and alpha is the weight, more properly referred to as the Lagrange multiplier. For the bias b = yk-wTxk for any vector x where alphak is not 0. Of course this does assume that there is a clean dividing hyperplane. This is not always so, as real world data has outliers and misclassified examples, so some tolerance has to be introduced. This is handled by an upper bound on the on the Lagrange multipliers. This allows the optimisation to trade of some misclassification in exchange for a fatter margin.

    The Kernel Trick

    So, with the algorithm we can get the best dividing hyperplane possible, but it’s still, basically, a straight line. This is a problem because there are many problems where a linear classifier simply cannot get good accuracy because of the distribution of the classes in the vector space – there’s simply no straight line or flat plane that can cleanly divide them.

    A non-linearly separable space

    However, the dot product in the SVM classification function doesn’t have to be a dot product. In fact, it can be any function that obeys certain mathematical conditions (function that do are referred to as Mercer kernels), which allows us to plug in another type of function. This kernel function can effectively define a mapping between the dimensions of the problem space, and some higher dimensional space where perhaps the data will be linearly separable. The function used in our PHP SVM looks like this:

    <?php
    protected function kernel($indexA, $indexB) {
            $score = 2 * $this->dot($indexA, $indexB);             
            $xsquares = $this->dot($indexA, $indexA) +
                            $this->dot($indexB, $indexB);
            return exp(-self::GAMMA * ($xsquares - $score));
    }
    ?>

    This is an implementation of a Gaussian Radial Bias Function, which maps onto a space with infinite dimensions. This trick allows good classification results on a variety of different sources, though does come at a cost in terms of extra complexity and computing cost.

    Sequential Minimal Optimisation

    One of the problems with SVMs is that solving the equation to get the Langragian multipliers for each of the support vectors is a knotty Quadratic Programming problem, and generally requires the used of a specialised library. People improved the situation by chunking the QP into smaller, more manageable problems, but John Platt took the idea further by breaking the QP down into a series of the smallest possible problems – solving for just a single pair of multipliers at a time. While it will look at more pairs than a more specialised library would have to, each pair can be solved in a fairly straightforward manner. This results in a faster running algorithm in many cases, and because large matrix operations aren’t used the memory usage is much more linear with the number of training examples. SMO is somewhat complex, but broadly can be broken into three parts. The first is a heuristic for choosing which pair of training examples to optimise, a method for doing the optimisation, and a method for calculating the bias, or b. The heuristic used to choose the two points starts by examine a given point, then comparing it with the current support vectors. The difference in the results of the two is compared, and point with the highest difference is chosen to optimise against (if there is one).

    <?php
    foreach($this->lagrangeMults as $id => $value) {
            if($value > 0 && $value < self::UPPER_BOUND) {
                    $result2 = $this->errorCache[$id];
                    $temp = abs($result - $result2);
                    if($temp > $maxTemp) {
                            $maxTemp = $temp;
                            $otherRowID = $id;
                    }
            }
    }
    ?>

    If this doesn’t find anything then the algorithm chooses any valid support vector that will optimise and uses that. If nothing comes from that loop, it finally just iterates through all the examples looking for one that will optimise. In both cases the start point is randomised.

    <?php
    $endPoint = array_rand($this->lagrangeMults);
    for($k = $endPoint; $k < $this->recordCount + $endPoint; $k++) {
            $otherRowID = $k % $this->recordCount;
                                           
            if($this->lagrangeMults[$otherRowID] > 0
                    && $this->lagrangeMults[$otherRowID] < self::UPPER_BOUND) {
                    if($this->takeStep($rowID, $otherRowID) == 1) {
                            return 1;
                    }
            }
    }
                           
    $endPoint = array_rand($this->lagrangeMults);
    for($k = $endPoint; $k < $this->recordCount + $endPoint; $k++) {
            $otherRowID = $k % $this->recordCount;
           
            if($this->takeStep($rowID, $otherRowID) == 1) {
                    return 1;
            }
    }
    ?>

    The actual optimisation is performed in the take step function. Initially the classification score is calculated for each point, and an overall score determined by multiplying them together. From that the constraint on the Lagrange multipliers alpha1 and alpha2 is calculated.

    <?php
    $low = $high = 0;
    if($target1 == $target2) {
            $gamma = $alpha1 + $alpha2;
            if($gamma > self::UPPER_BOUND) {
                    $low = $gamma - self::UPPER_BOUND;
                    $high = self::UPPER_BOUND;
            } else {
                    $low = 0;
                    $high = $gamma;
            }
    } else {
            $gamma = $alpha1 - $alpha2;
            if($gamma > 0) {
                    $low = 0;
                    $high = self::UPPER_BOUND - $gamma;
            } else {
                    $low = -$gamma;
                    $high = self::UPPER_BOUND;
            }
    }
    ?>

    This is a few lines but is really ensuring that the low value is max(0, $gamma), and the high min(upper bound, $gamma), with the upper bound added to the high example if the classes on the two examples are different, and subtracted from the lower if they’re the same. Once we have the constraints we need to find the maximum margin while only allowing the two multipliers in question to change. Most of the time we can find the maximum in a straightforward way by dividing the result difference by a value obtained from the maximisation function earlier ($eta in the example).

    <?php
    $a2 = $alpha2 + $target2 * ($result2 - $result1) / $eta;
    if($a2 < $low) {
            $a2 = $low;
    } else if($a2 > $high) {
            $a2 = $high;
    }
    ?>

    Calculating the other multiplier is pretty straightforward, most of the time, from looking at the original multipliers and the new $a2 multiplier.

    <?php
    $a1 = $alpha1 - $score * ($a2 - $alpha2);
    ?>

    The bias (b) update is pretty simple once the new Lagrange multipliers have been determined. A bias is calculated for each based on the current bias, and the change in multiplier. If both are valid then the average of the two new biases is used, else only the bias generated from the valid multiplier is.

    <?php
    $b1 = $this->bias + $result1 +
            $target1 * ($a1 - $alpha1) * $k11 +
            $target2 * ($a2 - $alpha2) * $k12;
    $b2 = $this->bias + $result2 +
            $target1 * ($a1 - $alpha1) * $k12 +
            $target2 * ($a2 - $alpha2) * $k22;
    if($a1 > 0 && $a1 < self::UPPER_BOUND) {
            $newBias = $b1;
    } else if($a2 > 0 && $a2 < self::UPPER_BOUND) {
            $newBias = $b2;
    } else {
            $newBias = ($b1 + $b2) / 2;
    }
    $deltaBias = $newBias - $this->bias;
    $this->bias = $newBias;
    ?>

    Using The Classifier

    The classifier is pretty straightforward to use, as longs the data is formatted appropriately. The format for the data is based on the svmlight format, and is pretty convenient for text classification – all that is needed is an ID for each term, and a unit length document vector. The format is:

    +1 1:0.049 45:0.029…..

    With the first entry being +1 or -1 for the two classes (or 0 for data to be classified), and then pairs of key:value entries in increasing dimension order. The data doesn’t have to be continuous though – dimensions with value 0 can be omitted. Using the example data in the zip file, we can trigger it to train:

    <?php
    $svm = new PHPSVM();
    $svm->train(‘data/train.dat’, ‘model.svm’);
    ?>

    And to classify, which you’re likely to do repeatedly (though for testing train function can actually take the data path as a third argument):

    <?php
    $svm = new PHPSVM();
    $svm->test(‘data/test.dat’, ‘model.svm’, ‘output.dat’);
    ?>

    This will output the results for each input to the output.dat file, and print an overall accuracy if the test data has judgements along with it. It gives a pretty good result, though it’s worth noting that the included training and test data are well balanced (similar numbers of positive and negative examples), so your mileage may vary.

    Accuracy: 0.92976588628763 over 598 examples.

    For more complex usage, or for more examples where memory limits becomes an issue, you will probably want to use one of the excellent SVM systems available such as LibSVM / LibSVM+ or SVMlight (though SVMlight is only free for non-commercial user). These offer great speed and accuracy, and are at the cutting edge of extensions to the algorithms. However, it may be preferential to actually classify examples in PHP using a model generated by one of those systems, which is quite doable. Special thanks to Lorenzo and Vincenzo for their comments about the PHPSVM class.

  • [Lorna Mitchell] Speaking at PHP Benelux 2010 November 30, 2009

    I’m delighted to announce that I’m speaking at the inaugural PHP Benelux Conference, to be held on Saturday 30th January in Antwerp, Belgium. The talk will be “Passing the Joel Test in the PHP World”; I gave this talk PHPNW09 in October and it was well-received there, so hopefully I can bring the same insight and inspiration to attendees at this new event as well!

    On a personal level I’m very pleased to have a reason to visit the Low Countries – Ibuildings is a dutch company and I’m already making plans to link up with my colleagues there by extending the trip by a few days. I’ve also never been to Antwerp so I’m hoping I’ll see something of the city while I’m there, if time allows. The benelux user group contains many friends so I’m looking forward to what I know will be an excellent event and catching up with all the friends who will be there.

    If you are attending, or thinking of it, let me know – and come and say “hi” to me on the day :)