Cloning a VMWare Server VM

I recently had a need to make a bunch of clones of a vmware virtual image on my vmware server. After doing a few by hand, I got tired of it and wrote a little script to do it for me. The script assumes that there’s a working set of virtual image files in a directory named “vm-template” and that the virtual machine name defined in the template is also “vm-template.” You can change these values by changing the SOURCE_DIR and SOURCE_NAME variables at the top of the file (or you could modify the script to set these variables using passed arguments). Whatever image you wind up using to do the cloning, you should make sure it isn’t running in order to avoid unpredictable results when doing the copy.

To use the script, just run it, passing the desired new virtual image name as an argument. A directory will be created using that name (so avoid spaces or weird characters; escaping them might also be ok). Files from the image to be cloned will then be copied into the new directory. The SOURCE_NAME value in the source image’s .vmx file will be replaced with the name you pass as an argument, and all files will be renamed to use the argument’s name rather than the SOURCE_DIR name. To clarify: Typically, your source image will live in a directory named (for example) “vm-template” and will be full of files named (for example) “vm-template.vmx,” “vm-template.vmdk,” etc. The script renames any such matching files to use the argument passed, and it changes references to those names within the .vmx file to point to the renamed file.

If your source image is large, it could take a few minutes to copy the files. The rest of the process goes quickly. Once you’re done, if you’re using vmware server, you’ll want to pick the option to add a new machine to the inventory. Then browse to the new file. When you boot it up, you should see (I’m using the web console here) a warning asking you whether you copied or moved the image. This is because we didn’t do anything to change the guid that identifies the image. Tell vmware server that you copied it, and it should make any necessary adjustments and boot your image.

Once it’s booted, you’ll need to make adjustments such as changing the hostname and applying any patches or updates that may have landed since you created the source image.

The script I used follows.

#!/bin/bash

SOURCE_DIR="vm-template"
SOURCE_NAME="vm-template"

DEST_NAME=$1

if [ -z $DEST_NAME ]; then
    echo
    echo "Please specify a VM name as the sole argument"
    echo
    exit 1
fi

if [ -e $DEST_NAME ]; then
    echo
    echo "$DEST_NAME already exists. Please specify another."
    echo
    exit 2
fi

echo

mkdir $DEST_NAME
echo "Copying source files to $DEST_NAME directory"
echo
cp -R $SOURCE_DIR/* $DEST_NAME/

cd $DEST_NAME
for file in `ls |grep $SOURCE_NAME`; do
    new=`echo $file | sed 's/'$SOURCE_NAME'/'$DEST_NAME'/'`
    mv $file $new
done

perl -pi -e 's/'$SOURCE_NAME'/'$DEST_NAME'/g' *.vmx
rm -rf *.lck

Using RewriteMap for query string voodoo

I had the need today to come up with an apache rewrite that would, in some cases, change the value of a query string parameter. So for example, for requests coming from anywhere but example.com with a URI beginning “/path” and with a query string parameter named “foo” with the value beginning “bar”, I needed to rewrite the value to be “baz”. I spent some time fooling around with backreferences in the RewriteRule, but I never came up with anything that worked. Eventually, I turned to RewriteMap, which lets you specify text files, hashes, or even external scripts whose output values will be inserted into the destination for the rewrite. So in my example, here’s the apache config:

RewriteMap partner_params prg:/path/to/script.pl
RewriteCond %{HTTP_REFERER} !(.*)example.com(.*)
RewriteCond %{REQUEST_URI} ^/path
RewriteCond %{QUERY_STRING} .*foo=bar.*
RewriteRule ^(.*)$ http://mysite.com$1?${partner_params:%{QUERY_STRING}} [R=301,L]

The RewriteMap line points to a script (source to appear below) that will be executed by the RewriteRule. The first condition specifies that the referring url does not contain “example.com”. The second condition specifies that the URI begins with “/path”. And the third condition specifies that the query string must have a key named “foo” with a value beginning like “bar”. The rule itself takes any matching request and diverts it to http://mysite.com with the same URI as the original request (so something beginning “/path”), then adds a question mark to denote a query string following. Then it passes the query string to the script that RewriteMap knows as “partner_params”. that script reads from STDIN and prints to STDOUT either a newline-terminated result or the four-character response “NULL”. If not NULL, the response is what gets substituted into the RewriteRule.

So now for the script:

#!/usr/bin/perl
$| = 1;
while (<STDIN>) {
if($_ =~/foo=bar/){
$_ =~ s/foo=bar/foo=baz/gi;
}
print $_;
}

Here we simply do a substitution, looking for foo=bar and replacing with foo=baz. And voila, we’ve got custom inline query string munging based on parameters available via pretty standard apache Rewrite data. My particular example probably describes a pretty rare need, but knowing how to have a rewrite call a script to do more complex parsing than is available via apache configuration directives could be handy in a number of ways.

Stage

For a long time at my day job, one of our big web site issues has been the staging of database-driven content. Particularly if you’re editing Drupal pages that have a lot of markup in them, publishing a node can be sort of scary, as it goes live instantly with any bugs you’ve introduced. In theory, Drupal’s preview feature can be used to view your changes before you commit to them, but this too is scary, as the content isn’t rendered exactly as it will be once published. Further, using vanilla Drupal with its preview function to stage content requires that you roll out changes one by one. If you want to group changes for a mass rollout, the best you can do is wrap your changes in html comments and uncomment them one by one during deployment, hoping you don’t fat-finger anything in the process. I’ve always thought this would be a pretty difficult problem to solve, but yesterday, I came up with what feels like a satisfactory method for staging content.

The new stage module addresses both safety-netted staging of individual content and management of change sets.

It works by tapping into Drupal’s revision system, which already allows you to track changes to content over time and to revert to older content. For specified types of content, any additions or edits are published using the normal Drupal workflow, but on publish, the revision number is pinned at its last blessed point. You can edit or add any number of documents, and they all remain pinned at their pre-edit revision until you roll the whole batch of changes forward. When you roll a batch forward, all the revision numbers are brought to their most recent and pinned there until the next deployment. In the administration section, you identify staging and production servers. If you view an affected node from one of the specified staging hosts, you see the latest copy; if you view it from a production host, you see the pinned version.

This workflow is ideal for environments in which fairly frequent milestones are deployed. Because of Drupal’s handy dandy revision system, you can compare versions of the content across pushes to see what’s changed.

The module is hot off the presses this morning and so is probably still buggy and feature-poor, but it’s a start.

Phpbb3 import error: bbcode_uid truncation

I recently upgraded an install of phpbb to phpbb3. Shortly thereafter, I moved the site that the forum runs on to different hardware after several days of downtime on the original hardware (and an unresponsive vendor). To move to the new hardware, I dumped the database to a text file, compressed it, and shot the database and all site files across the network to the new hardware. Then I uncompressed the database and slurped it into mysql. Simple enough. What I hadn’t considered in advance was the fact that I was moving from mysql4 to mysql5. Accordingly, some weird things started happening when I started testing the site on the new hardware. I googled around a bit to discover that some of the problems were a result of the mysql upgrade, and I finally found this script, which purports to solve the problems by modifying the database structure. The script seemed to work just fine. The problems I had seen went away, and I figured the migration was a success.

But then somebody in the forums pointed out that bbcode throughout the site was messed up. And sure enough, all posts that had been imported had weird extra characters appended to bbcode blocks, which kept the bbcode from being converted into the appropriate html. For example, a block of bbcode might look like this: [quote="username"scd]stuff[/quote:scd]. But the characters were never consistent across posts. A bit more googling turned up the fact that phpbb has a field called bbcode_uid that is supposed to allow eight characters, but either when moving from mysql4 to mysql5 or as part of that nifty script I ran (I’m not sure which), the field gets truncated to five characters, which lops off the last three characters of an eight-character bbcode_uid, which ultimately results in the weird display we found.

What’s going on is that parsing nested tags (e.g. “[quote][b][url][/url][/b][/quote]“) can become laborious for the server, especially when tags don’t get closed properly. To make it more surefire and to simplify the process, phpbb appends a bbcode_uid to any bbcode inserted. So when you type “[url]http://daryl.learnhouston.com[/url]“, what actually gets inserted into the database is something like “[url:d98cJ1pv]http://daryl.learnhouston.com[/url:d98cJ1pv]“. This makes it so that you’re not having to figure out arbitrary nesting, because every opening tag has a corresponding unique end tag; you don’t have to find a beginning tag’s mate by parsing a string recursively, in other words. It’s a really cool idea. Of course, to remove the bbcode_uids from posts as a page is built, you need to store the bbcode_uid associated with a given post, so that it can be stripped out once tags are matched to one another. This is the bbcode_uid field in the posts table. And this field has just been truncated to five characters by the database move. Which means that when phpbb tries to find the bbcode_uid value within a given post, it finds and strips out only the first five characters, which results in three weird characters being appended to bbcode tags and the improper display of bbcode. In every single post and every single signature of your forums, which in my case was nearly 200,000 posts.

The fix is rather daunting to implement. Basically, you have to script something that looks at every single post and every single signature, finds bbcode_uids therein, matches the first five characters to the bbcode_uid field in the posts table (just as a check), and then updates the bbcode_uid for each post to the match found (this is after altering your table to make the bbcode_uid column accommodate eight characters, of course). If you get this wrong, you’ve basically wrecked your whole database, and bbcode for posts in the past will never render correctly. Of course, if you’ve discovered this problem before anybody has posted to your site, then you can alter the database and reimport the data, but this isn’t an option if people have been using the site for a few days before the issue was reported. Luckily, I was able to come up with a pretty simple script to fix the issue. Of course I was terrified to push the start button, so to speak, but push it I did, and it worked.

If you’re having the same issue, you can try my fix at your own risk.

Linux in the 21st Century

For years now, I’ve been an avid Linux user. I (half) joke about how crummy Windows is, and I hate when I have to support Windows, though I’m really not as much of a Linux zealot as you might think. I have to confess that there’s a part of me that likes knowing how to do arcane things that lots of other people don’t know how to do. See all that text scrolling by in my simple terminal window? That’s me installing software, bucko. No graphical installers with smiling paperclips for me. I really do like understanding how my system works (more or less), being able to look under the hood to troubleshoot things. I like not having to understand how a registry works in order to tweak software (though I do have to know how to edit a text configuration file, which might be as scary to others as a registry is to me). But some of my old school willingness to dispense usability in favor of a dumb sense of pride and configuration simplicity is wearing off. More and more, I’m finding that there are tools it’d be nice to have that aren’t best implemented in a terminal application. Sure, I could write a program to read a text file I store meeting requests in and send me an email when I’m about to have a meeting, but that takes work and seems not terribly reliable. More and more, I’m looking for tools to handle these sorts of tasks for me, and I’m finding that I like them. I’m emerging from my self-imposed prison of command line solutions and testing out tools that just might help me work like a normal human being, and with some surprisingly good results.

One such tool is Korganizer, the KDE desktop manager’s calendar and organizer tool. In recent months, I’ve been required to attend many more meetings than in the past, and trying to keep them all straight has been a pain. I had tried using Mozilla’s Sunbird calendar program at various times in the past, and it’s a fine piece of software, but it clutters up my workspace. In addition to my mail window, my browser, my irc client, and my tabbed terminal window, I also had to have Sunbird running, and it just irritated me. So I recently tried Korganizer, which it turns out will hide in your system tray and pop up alerts reliably. I’ve been using it for a couple of months now and really have no complaints. It’s a little sluggish on my system, but not so bad that it keeps me from using it. I can tolerate a little UI lag when adding events if the trade-off is reliable notification of upcoming events, the ability to suspend or dismiss events, reasonable handling of recurrence, and a view of my day or week (or month) that lets me see at a glance what’s on my schedule. And Korganizer has all of these things. It also handles todo lists and journals, which I guess are like meeting minutes. I started using todo lists but found that having to open the app to see them made them less useful. I haven’t played with journaling. There are a bunch of buttons at the top that I haven’t done much with, though I’m sure they’re useful. The system tray utility seems to use up no appreciable resources, and that’s a big win on a system that runs dev mysql and apache servers in addition to all my desktop software. I’m sure there are things that Korganizer could do a lot better (I wish I could see our executive calendars, kept on a remote groupware server; as it is, I’m an island), but it beats holy hell out of hacking together something using text files and output from the “cal” command, and it has become a must-have tool for me.

Next up is Komodo Edit. I’ve taken comfort in the simplicity of the command line and the non-GUI text editor since I became used to editing files in pico and reading mail using pine back in college. When I began doing a lot of programming and learned a lot of the cool things you can do using the vi editor, I couldn’t imagine I’d ever go back to an IDE that would require mouse moves and menu navigation. My fingers are hard-wired to do vi commands now. I can do text replacement in my sleep (want to add a tab to the beginning of lines 23 – 47? type: “<ESC> :23,47s/^/\t/”; oops, wanna undo it? just type “u”; then “:wq” to save and close), and I have trouble editing in any other way. One of my few beefs with vi has always been that it’s hard to do operations that span more than one vertical span of screen real estate. To delete a line range, you have to count lines or look for line numbers and then delete or cut. If you’re trying to move a hundred lines around, this can be a minor pain. A few years ago, I tried out ActiveState’s Komodo IDE. It’s built on top of Mozilla’s code and so is a cross-platform solution. At the time, it was very sluggish and didn’t offer much that interested me. Sure, there was code completion and syntax highlighting, but I can get the latter in vi, and the former almost always winds up irritating me more than it helps me. Plus it cost money to use the non-evaluation version. Recently, ActiveState and Komodo have been in Mozilla news. They’re starting a project to open up parts of their source, it turns out. In reading about this, I learned about Komodo Edit, which is the light-weight version of their pay-to-play editor. It’s free and pretty responsive (probably because it’s doing a lot less junk behind the scenes). Most importantly for my use, it has vi key-bindings. So I can fire up Komodo Edit, avail myself of what scrolly and selection capabilities are useful to me, and still do the weird “:23,47s/^/\t/” sort of commands that my fingers are so used to. What’s more, I can define projects and view select files in a sidebar, so I do a lot less typing to navigate my file system when working on projects that require me to edit a number of files. I’ve also discovered that the find and replace helps out sometimes when there’s some regex that I can’t quite work out by hand (e.g. when I want to replace with newlines). I probably use a tiny subset of Komodo Edit’s feature set, but they’re pretty useful. I find that if I’m doing one-off edits or will be staying in one file and toggling to the command line to test (e.g. when working on a perl script to parse a log and display summary info), I do better to stay at the command line, but Komodo Edit is fast becoming not a “must have” but a solid “nice to have when I want it” tool.

My latest interest is in launchers. I never really caught on to Mac OSX’s Quicksilver launcher. Or it’s not that I didn’t get it at some level as that I didn’t see that it was a killer feature for most Mac users, who I think of as people who like to draw pretty pictures more than as people who tend to want to remember abstruse key combinations needed to make a launcher behave in useful ways. But as I find myself more and more trying to get back to documents or applications that are buried in the file system or in menus, I find myself wishing I could just type a couple of keys to pull up the apps or docs. KDE’s Katapult looks very slick and promising, but it’s geared toward KDE applications and interactions, and I can’t seem to pull myself away from the Gnome desktop manager. Although I’ve read that Katapult is easy to extend, documentation seems poor at best, and I suspect you have to drink the KDE Koolaid and know a bit about working with KDE frameworks in order to make much headway. Gnome has an app named gnome-launch-box that is sort of like Katapult, but it’s very ugly. Although you can run it without the window initially on top of other apps, I can’t figure out how to then provoke it (in Katapult, you press CTRL-space and the slick interface appears instantly). It’s pretty responsive in terms of finding and launching folders and applications, and it handles multiple matches (e.g. a list pops up displaying both Korganizer and Komodo Edit if you type “ko”) and seems to be wired for extensibility, but by the developers’ own admission, it’s just not ready for prime time yet. Ubuntu ships with a tool called Deskbar that is a sort of launcher, but it hasn’t worked very well for me so far. It’s hard to predict what results it’ll return and in what order, and though it appears to be fairly extensible, a plugin I wrote for it (actually, I just modified the bugzilla plugin to point to my bugzilla install) is quirky at best. So while I’m on the hunt for a good launcher, none of the options I’ve found to date quite cut the mustard yet.

Of course I use Flock and Thunderbird. In the next few weeks, Flock will be making a big step toward its original vision for the browser as a social tool. Thunderbird is pretty low-frills but has served my email needs very well for roughly five years now. But these apps are old news for me, so they don’t really fit into this post, which outlines a recent foray into a broader set of GUI apps. In the same category are xchat and OpenOffice.org.

So, there you have it. Back into my dork cave I go. All this time out in the land of the first-class user has instilled in me a craving for a darkened room and the glow of a terminal window flickering up at me in a chunky Courier font.

:wq

Getting POST data from mod_perl2

I learned today after spinning my wheels a lot over the last few days that mod_perl2 under apache can parse POST data only once. I tried any number of ways to get at the POST data, but it was always empty. Finally, I came across something that mentioned that POST data could be parsed only once. The module I’ve been working on was running at the PerlLogHandler phase, which is after the request had already been parsed. So I could try all I wanted to re-parse the POST content, but I was parsing something that didn’t exist. When I attached the module to the earlier PerlHeaderParserHandler request phase (more info on this stuff here), I had access to the POST data. Of course, my grabbing the POST data in this phase makes it unavailable to other phases, which means that if a PHP file in the directory the module screws with tries to access the $_POST array, it’ll be empty and the script probably won’t behave as expected. This is ok for my purposes, as the module I’m writing is for logging data sent from a source that doesn’t expect a response. The point of my module is to get raw POST data and dump it in a log, so destroying the POST variables doesn’t hurt anything in my case.

The Apache2::Request module is designed to parse and cache request data for the duration of a request. Of course, if you don’t instantiate it until a late phase when the POST has already been destroyed (as I was doing), it’s not very helpful. The module also happens to be a minor pain to install, unlike most perl modules I’ve had to deal with.

New site software

I’ve been doing a lot of work in Drupal 5.1 in recent months. I’ve liked Drupal since I first tried an install of it about three years ago. It’s got an active community and improves markedly with every release in terms of front-end usability, back-end code quality, performance, and the ease with which one can do custom development of the software. It’s also got a very well-documented API, and the methods of extending it to make it do more things than it does out of the box really make good sense to me.

My experience with developing for WordPress, which I’ve used as a blogging platform for upwards of three years now, has been somewhat different. I haven’t tried it out in a while, and things have likely gotten better since I did, but my impression upon thinking back on it is that writing plugins for WordPress was like building a popsicle stick house, where extending Drupal is like using a Construx set with pieces that snap nicely and tidily into place. I don’t do much custom development in Drupal for my own site, but I do frequently prototype things at some domain or another that I’ve got hosted, and I’ve decided finally, with the improvements that have gone into Drupal 5.1, to switch over for my own blog.

For every-day bloggers, WordPress wins out hands-down. It’s great software that makes blogging easy, even if you want features that don’t exist in the default install. The hosted version is very nice as well, and though I’ve never engaged very heavily with the WordPress developer community, I understand that it thrives. So my shift here shouldn’t be construed as a slap at WordPress. I’d recommend it over Drupal for anybody who just wants to run a personal blog. I guess I’m switching because I’ve just personally grown to feel at home in Drupal.

Along with the new software comes a new design (if you can call it that). I haven’t checked it out in IE yet (not sure I can bear to try). It uses the (slightly-modified) markup of the default “garland” theme that ships with Drupal, and I’ve added some graphics and tweaked the styles a bit. I’m not sure the green sidebars work. I’m not sure any of it works, but that’s why I write code for a living instead of pushing pretty colors around the page.

In theory, all the old blog posts will link up as they did previously, though I suspect there are some images for old (pre-flickr) posts that I didn’t pull over. I wrote some code to pull my WordPress posts, tags, and comments into Drupal that I’ll make available to anybody who asks, with the understanding that you get what you pay for, minus any guarantee of support or any assurance that it won’t delete your whole blog and break up with your girlfriend in a most crude and cruel way. I found an importer or two that just didn’t do the trick for me, so I wrote one that you just run at the command line and that seems to work pretty well within its limited scope.

To emulate the functionality of my old blog, I installed the following Drupal modules that aren’t shipped in core:

  • Akismet (spam control)
  • Archive (doesn’t come with a calendar, and I wrote a custom sidebar block for it)
  • Pathauto (for auto-generated friendly URLs)
  • Tagadelic (to show the tag cloud in the sidebar)

So, there you have it. If you’re a regular reader and something’s broken, please do let me know.

phpMyAdmin and Designer mode

phpMyAdmin Designer ViewAnybody who’s done any open source development knows about this nifty tool called phpMyAdmin that lets you manipulate mysql databases through a web interface. For Luddites like me, using the command line interface is usually preferable to using any sort of GUI tool that requires pesky mouse moves and clicks, so I’ve generally deployed phpMyAdmin to be used by others who needed to screw with databases but haven’t been comfortable with (or had access to) the command line.

One of my tasks for today is drawing up some diagrams to show relationships between tables in a big fancy dimensionally modeled database I’m trying to blunder my way through designing. The problem is that I utterly hate all the tools I’ve tried for this on Linux. What I wanted was something that would read in a mysql schema file and populate the basic diagram for me so that all I had left to do was to map out the relationships. There’s a UML-diagram application called Umbrello that does entity-relationship diagrams of the type I need to deliver, and a user has contributed a script that will read in a schema file (though it’s finicky). I don’t like the application itself very much, but it fit my basic requirements. This morning, it started crashing, though, and was unusable. Luckily, I hadn’t done much manipulation of the model in it yet. So the quest for a better tool was back on.

I tried various versions of DBDesigner4 and MySQL Workbench, even going so far as to try running the Windows versions in wine when I experienced problems with the Linux versions, but nothing worked out.

Finally, I decided to take another look at phpMyAdmin to see if it had been enhanced with any sort of modeling capabilities. And it has! In recent versions, you can enable a “Designer” view by uncommenting a couple of lines in the config file and slurping in the tables for the phpmyadmin database in the scripts directory. It’s not a perfect tool by any means, but when I click the “Designer” tab in the app, it shows me a nice DHTML view of the tables that lets me drag them around, specify relationships, toggle to show or hide tables that have no relationships defined, etc. It comes with a nifty little palette and a toggleable sidebar to handle these operations, and it’s really a pretty elegant little piece of work. What’s more, and what makes this really useful for my purposes, is that I can save the frame (phpMyAdmin keeps a navigation frame open on the left) to my local disk, zip it up, and send it to somebody, who can then perform the same DHTML manipulations I was able to perform, making it ideal for sending along a complex schema that can have portions of it disabled for ease of viewing. (To clarify, they can save none of the information back to the database, but there’s some degree of flexibility with respect to how they can control the static view.) And to top if all off, if I want to make changes to tables, I can do it at the command line or right there in phpMyAdmin, and there’s no re-importing of a schema — the Designer view will be up to date the next time I reload it. In Umbrello, it’s my impression that any changes I made after import (e.g. drawing relationships) could not be exported back out in a useful way for porting back to the database, so I would constantly have been updating the schema, importing, and redrawing relationships.

This is a great tool for my purposes, and of course phpMyAdmin’s core features are also very useful in many environments.

Flexible Drupal surveys

A few weeks ago, my company needed to publish a survey with a pretty flexible layout. Had we been constrained to the one-field-per-row constraint that Drupal’s survey module allows for, we would have had a very long an ugly form when in fact what we wanted was a nice tidy grid of small form controls that was much less imposing for users to consider filling out. So I hacked our old version of the survey module (for Drupal 4.6) to add a “layout” field to the fields tab for a survey. In the layout field, those who can administer surveys can specify markup and drop form fields into the markup using numeric placeholders wrapped in curly braces. Surveys for which no layout is defined use the default layout with numeric field weighting. I’ve submitted a patch to incorporate this functionality, and you can find the bug report with attached patch here if you’re in need of such functionality before the patch gets review or if the survey module developer declines to integrate it. The patch is for Drupal 5.x.

On a related note, I created a patch for the forms module that lets you add HTML between form elements. This hack/patch arose out of a need to stick a quick text snippet between two fields, where the snippet couldn’t be contained gracefully in the “description” line of the topmost of the fields. I add a form field type “html” that spits out whatever HTML you specify. I wrote this code (or the old version of it; this patch too is for Drupal 5.x) when my company needed explanatory HTML between two fields but the need to manipulate the form fields themselves hadn’t arisen yet. This functionality is good for insertion of quick snippets of HTML, where the first patch I mention above is best for layout overhauls.

Both patches probably represent security risks on sites that allow non-administrators to create forms or surveys, as I’m not (as yet) filtering content, so anybody who can create a survey can add arbitrary HTML. So apply the patch with that in mind.

Styling Drupal 5.x search forms

I’m working on a project that requires me to apply a fancy pants style to a Drupal search form. I thought this would be simple enough, as it’s pretty easy to override default themes for pretty much everything else in Drupal, but it turns out either that I’m a dolt or that there’s not much clarity out there on this topic. After screwing around with a lot of things (e.g. poring over debug_backtrace() output, writing die() statements all over the place, temporarily hacking the search and node modules, etc.), I searched Drupal’s site and found a promising link that turned out not to be the solution I needed (it simply didn’t work).

At long last, I tried creating a theme function named mytheme_search_form(). The search module has a function named search_form(), but in all my hair-pulling, I never saw anything that indicated that you could override this function by prepending your theme name to it (I would have expected to find calls to “theme(‘search_form’, $args)” somewhere). At any rate, I ultimately created the above-named function and gave it the following definition:

function mytheme_search_form($form){
        return _phptemplate_callback('search_theme_form', array('form' => $form));
}

Then I created a file in my template directory named “search_theme_form.tpl.php” and built a custom form.

Next up was adding additional search fields to the form. To do this, I looked at node.module, where the function node_form_alter() adds fields to the default form. It didn’t seem necessary to jump through that hoop since my form was mostly hard-coded anyway, so I just added some radio buttons with the name “category” so that I could filter search results by taxonomy. Simple enough. But the search never actually filtered on my results. Here I did more hair-pulling and weird debugging. Finally, I went back to the hook_form_alter() functionality, having noticed a “processed_keys” key in the $form array. So to make my form honor my category search, I added the following things to my template.php file. It’s not clear to me how much of this is necessary, and I rather suspect I’m doing something stupid here, but it seems to work and I’m on deadline, so I’m rolling with it.

function mytheme_search_keys($type = null){
        $keys = search_get_keys();
        if($type){
                $keys = search_query_insert($keys, 'type', $type);
        }
        if($_POST['category']){
                $keys = search_query_insert($keys, 'category', $_POST['category']);
        }
        return $keys;
}

/**
 * Taking over this function so that I can call mytheme_search_validate to do the advanced search.
 */
function blog_form_alter($form_id, &$form){
        if($form_id == 'search_form'){
                $form['#validate']['mytheme_search_validate'] = array();
        }
}

/**
 * Need to call this to add category to the processed_keys array item so that
 * the category actually gets searched in the mini-advanced form we generate
 * in mytheme_search_form().
 */
function mytheme_search_validate($form_id, $form_values, &$form){
        $keys = $form_values['processed_keys'];
        $keys = mytheme_search_keys($form_values['module']);
        form_set_value($form['basic']['inline']['processed_keys'], $keys);
}

function faq_search($op = 'search', $keys){
        switch($op){
                case 'name':
                        return t('content');
                default:
                        $keys = mytheme_search_keys('faq');
                        return node_search('search', $keys);
        }
}

function forum_search($op = 'search', $keys){
        switch($op){
                case 'name':
                        return t('content');
                default:
                        $keys = mytheme_search_keys('forum');
                        return node_search('search', $keys);
        }
}

function blog_search($op = 'search', $keys){
        switch($op){
                case 'name':
                        return t('content');
                default:
                        $keys = mytheme_search_keys('blog');
                        return node_search('search', $keys);
        }
}

Now for an explanation. The “mytheme_search_keys()” function is a helper that lets me make sure I’m limiting the search to a given node type. It’s not clear to me that this is absolutely necessary, but things seem not to work if I don’t add the “type:” string to the search, so I’m leaving it in. Note that this function also looks for $_POST['category'] and adds it to the keys. If I wanted to add other search fields, I’d add them here as well. I suppose that since mytheme_search_validate() calls this function to set keys, I could eliminate the extra function and just do the same work in mytheme_search_validate().

Next up, blog_form_alter(). The hook_form_alter() functions are associated with modules, so I chose one I knew my site would be using that didn’t have a form_alter hook defined already. It feels kind of hacky, but it seems to work. The idea here is that we need to make the form run a validation function in order to add the keys we’re pulling in from mytheme_search_keys(). It was when I added this code that the category filter actually started working, so it seems to be a crucial bit. The key seems to be adding the keys to the $form['basic']['inline']['processed_keys'] array item, which seems to handle adding the search criteria to the URL and to the search itself.

Finally, I added the three _search() functions, which again feel a little extraneous, but the thing doesn’t work unless I add them, so they’re staying put for now. All we’re doing in these functions is adding the node type to the keys being searched (the search code extracts things like “type:blog” and “category:3″ from the query to do advanced searches) and then executing the node_search() function with the revised $keys value.

So, there you have it, the long way around to having a custom-themed Drupal search form with additional filters based on node type and category. Hope this saves somebody a few of the 10 or so hours I spent staring incredulously at my screen as solution after solution failed to work.