Cloning a VMWare Server VM

I recently had a need to make a bunch of clones of a vmware virtual image on my vmware server. After doing a few by hand, I got tired of it and wrote a little script to do it for me. The script assumes that there’s a working set of virtual image files in a directory named “vm-template” and that the virtual machine name defined in the template is also “vm-template.” You can change these values by changing the SOURCE_DIR and SOURCE_NAME variables at the top of the file (or you could modify the script to set these variables using passed arguments). Whatever image you wind up using to do the cloning, you should make sure it isn’t running in order to avoid unpredictable results when doing the copy.

To use the script, just run it, passing the desired new virtual image name as an argument. A directory will be created using that name (so avoid spaces or weird characters; escaping them might also be ok). Files from the image to be cloned will then be copied into the new directory. The SOURCE_NAME value in the source image’s .vmx file will be replaced with the name you pass as an argument, and all files will be renamed to use the argument’s name rather than the SOURCE_DIR name. To clarify: Typically, your source image will live in a directory named (for example) “vm-template” and will be full of files named (for example) “vm-template.vmx,” “vm-template.vmdk,” etc. The script renames any such matching files to use the argument passed, and it changes references to those names within the .vmx file to point to the renamed file.

If your source image is large, it could take a few minutes to copy the files. The rest of the process goes quickly. Once you’re done, if you’re using vmware server, you’ll want to pick the option to add a new machine to the inventory. Then browse to the new file. When you boot it up, you should see (I’m using the web console here) a warning asking you whether you copied or moved the image. This is because we didn’t do anything to change the guid that identifies the image. Tell vmware server that you copied it, and it should make any necessary adjustments and boot your image.

Once it’s booted, you’ll need to make adjustments such as changing the hostname and applying any patches or updates that may have landed since you created the source image.

The script I used follows.

#!/bin/bash

SOURCE_DIR="vm-template"
SOURCE_NAME="vm-template"

DEST_NAME=$1

if [ -z $DEST_NAME ]; then
    echo
    echo "Please specify a VM name as the sole argument"
    echo
    exit 1
fi

if [ -e $DEST_NAME ]; then
    echo
    echo "$DEST_NAME already exists. Please specify another."
    echo
    exit 2
fi

echo

mkdir $DEST_NAME
echo "Copying source files to $DEST_NAME directory"
echo
cp -R $SOURCE_DIR/* $DEST_NAME/

cd $DEST_NAME
for file in `ls |grep $SOURCE_NAME`; do
    new=`echo $file | sed 's/'$SOURCE_NAME'/'$DEST_NAME'/'`
    mv $file $new
done

perl -pi -e 's/'$SOURCE_NAME'/'$DEST_NAME'/g' *.vmx
rm -rf *.lck

Stage

For a long time at my day job, one of our big web site issues has been the staging of database-driven content. Particularly if you’re editing Drupal pages that have a lot of markup in them, publishing a node can be sort of scary, as it goes live instantly with any bugs you’ve introduced. In theory, Drupal’s preview feature can be used to view your changes before you commit to them, but this too is scary, as the content isn’t rendered exactly as it will be once published. Further, using vanilla Drupal with its preview function to stage content requires that you roll out changes one by one. If you want to group changes for a mass rollout, the best you can do is wrap your changes in html comments and uncomment them one by one during deployment, hoping you don’t fat-finger anything in the process. I’ve always thought this would be a pretty difficult problem to solve, but yesterday, I came up with what feels like a satisfactory method for staging content.

The new stage module addresses both safety-netted staging of individual content and management of change sets.

It works by tapping into Drupal’s revision system, which already allows you to track changes to content over time and to revert to older content. For specified types of content, any additions or edits are published using the normal Drupal workflow, but on publish, the revision number is pinned at its last blessed point. You can edit or add any number of documents, and they all remain pinned at their pre-edit revision until you roll the whole batch of changes forward. When you roll a batch forward, all the revision numbers are brought to their most recent and pinned there until the next deployment. In the administration section, you identify staging and production servers. If you view an affected node from one of the specified staging hosts, you see the latest copy; if you view it from a production host, you see the pinned version.

This workflow is ideal for environments in which fairly frequent milestones are deployed. Because of Drupal’s handy dandy revision system, you can compare versions of the content across pushes to see what’s changed.

The module is hot off the presses this morning and so is probably still buggy and feature-poor, but it’s a start.

Lonely

I’ve been using my 3-year-old gmail account more and more for things lately. I created it back when gmail was still in invite-only mode because I could, but I’ve never had much use for it because I’ve preferred to have my mail stored locally in an email client that didn’t necessitate that I keep a gmail tab open in my browser. But I’ve been more and more irritated by spam lately and have even thought about changing my email address. Problem is, my personal email address is old enough that it’s attached to accounts and lists I can’t even remember that I might still need to get info from. I had noticed that spam control on my old gmail account seemed to be pretty good. So this weekend, I looked into hosted gmail for my domain. It was actually very easy to set up. Within a half hour of starting my investigation, I started getting mail at my own personal instance (for all practical purposes) of gmail. And I’ve been lonely ever since. Only one or two spams have actually gotten to my inbox (and some 500 were successfully blocked in the past day), and this paucity of spam has actually helped me to garden a few other emails I get but am not interested in. I’ve just continued to delete them along with spam for however many years, but now that I get no spam, I’m finding it easy to weed out these other annoying messages (e.g. from lists I no longer care about). The result, of course, is that now I’m lonely. My old email client (Thunderbird) caught mucho spam, but mucho still got stuck in my box, so there was always something of a clatter simulated by the clutter. Now that’s gone, and I’m rattling around in my own inbox, listening for the whisper of footsteps, peeping out the front window for visitors.

Hard Drive Enclosure

We have an old laptop whose AC adapter in is busted. It’ll stay powered on for just a minute or two before shutting off. This means of course that we can’t keep it on for long enough to charge the battery. Unfortunately, we have a year’s worth of photos of Lennie on this computer’s hard drive. This week, I went in search of a hard drive enclosure that would allow me to pull photos off the drive and then go on to use the drive as a portable external disk. I’ve bought two so far, and neither one has hardware that’ll accept the pins on my drive. This is where I hang my head in shame and admit that I don’t know very much at all about hardware. ATA vs. IDE vs. SCSI? It’s all Greek to me. Here I implore my friends who do know about hardware to help me out. Drive specs follow, along with photos of the pins.

Some strings from the back that look like model numbers:
HDD2188
MK8025GAS

Cylinders: 16383

Compaq P/N: 312954-001




Update: A coworker took a quick look at the images and said he guessed the piece pictured was a connector that could be removed. Sure enough, I gave a slight tug and the thing wiggled. So I tugged a little more, and there was a regular old IDE connector plugged into it. I currently have the drive mounted and am yanking pictures off of it. Whew.

Google Docs

I’m usually pretty leery of using online services that I don’t administer for things that matter to me. For example, I’ve resisted a number of times using Google’s calendar for work purposes because there’s potentially sensitive information being posted to the calendar. So not only do I not have control over leaks of the data, but I don’t have control over backups, uptime of the service, etc., and this seems a lot of liability for something I need to make sure I’ve got access to. (Honestly, though, I think the smart folk over at Google are probably generally more competent than I am to guarantee uptime, backups, etc. — comparative benefits packages would suggest as much, at least.)

I’m very satisfied with one aspect of Google’s online service, however, and I’m consistently able to put aside my paranoia to use Google Docs for collaboration. Now I’d never store an important sensitive prose/text document there, but for planning server maintenance, the spreadsheet application is hard to beat. You share a document with everybody who’s involved, and everybody can view and edit the document at the same time. This past weekend, I was tasked with taking another shot at setting up replication between some mysql servers. We’ve set this up in the past but have lost confidence in the validity of the replication. So a coworker and I made another go of it this weekend. In preparation, I made a punch list of our steps, from putting up downtime pages and blocking access to the database at the firewall to pasting in commands for dumping data and resetting meta-data. I was able to color-code the steps by server so that it was easy to tell at a glance on what hardware to perform a step. And then as we went through the steps, we could update columns describing who performed a step and when. Of course, we’re coordinating this in a chat window as we’re doing the work, but it’s neat to watch the spreadsheet being updated interactively as we go, and this method provides a really simple, nice way to collaborate and keep a record of the process. Since the data’s not terribly sensitive (provided you don’t put passwords in), hosting it elsewhere doesn’t give me the heebie jeebies, and it’s nice to have a centralized repository of past maintenance events to build on for future maintenance. If there were a version you could download and install on your own hardware, I’d do it in a heartbeat and even use the apps for sensitive data, but then how would Google watch your every move and deliver search results based on the documents you create?

Blogged with the Flock Browser

Network Solutions squatting

Last night, I was poking around to try to find a domain name for a little hobby project I’m working on. I’ve always found Network Solutions’s domain search tool to be pretty useful because it comes back with a grid of all common TLDs and their availability, and it keeps a list of all the ones you’ve searched for in your session, which is handy if you’re having to try a lot of fairly similar versions of the name you’re looking for. Finally, last night, I found a name I liked, but I wanted to sleep on it to make sure. This morning, I went to my usual registrar of choice (GoDaddy) to buy the domain, and it was listed as unavailable. I went back to NetSol’s tool to see if they had the same result (surely somebody hadn’t purchased the name in the few hours since I had discovered it). NetSol showed the domain available. Hmmm. So I looked at the whois info and found the domain registered to NetSol with a creation date of yesterday. I’ve always been a little suspicious that registrars might log domains searched for and hold them or sell them to squatters for a hefty fee, but I’ve never personally been bitten by the practice. But here we have confirmation that NetSol is essentially squatting on domains that people express interest in. So, why is this a big deal? Two reasons. One, it’s a pain to have accounts with multiple registrars. I did it for years and have finally in the last couple of years managed to consolidate my domains with one registrar that has proven to be a good vendor. Two, there’s a substantial price difference. NetSol charges $35 per year for a domain, while GoDaddy charges around $10. I went ahead and bought the domain through NetSol and immediately did a transfer request to move the domain to GoDaddy. I guess NetSol sort of has the right to do this, but it seems pretty crappy. I searched on a bunch of names last night, and a few that were available, I wound up deciding not to get. That means that the next time somebody searches for those domains through another registrar, they’ll show as unavailable and won’t be purchased. I guess I paid a $15 service fee for using NetSol’s tool. Had I known that a search constituted license to squat and gouge, I wouldn’t have used the service, and henceforth I won’t.

Phpbb3 import error: bbcode_uid truncation

I recently upgraded an install of phpbb to phpbb3. Shortly thereafter, I moved the site that the forum runs on to different hardware after several days of downtime on the original hardware (and an unresponsive vendor). To move to the new hardware, I dumped the database to a text file, compressed it, and shot the database and all site files across the network to the new hardware. Then I uncompressed the database and slurped it into mysql. Simple enough. What I hadn’t considered in advance was the fact that I was moving from mysql4 to mysql5. Accordingly, some weird things started happening when I started testing the site on the new hardware. I googled around a bit to discover that some of the problems were a result of the mysql upgrade, and I finally found this script, which purports to solve the problems by modifying the database structure. The script seemed to work just fine. The problems I had seen went away, and I figured the migration was a success.

But then somebody in the forums pointed out that bbcode throughout the site was messed up. And sure enough, all posts that had been imported had weird extra characters appended to bbcode blocks, which kept the bbcode from being converted into the appropriate html. For example, a block of bbcode might look like this: [quote="username"scd]stuff[/quote:scd]. But the characters were never consistent across posts. A bit more googling turned up the fact that phpbb has a field called bbcode_uid that is supposed to allow eight characters, but either when moving from mysql4 to mysql5 or as part of that nifty script I ran (I’m not sure which), the field gets truncated to five characters, which lops off the last three characters of an eight-character bbcode_uid, which ultimately results in the weird display we found.

What’s going on is that parsing nested tags (e.g. “[quote][b][url][/url][/b][/quote]“) can become laborious for the server, especially when tags don’t get closed properly. To make it more surefire and to simplify the process, phpbb appends a bbcode_uid to any bbcode inserted. So when you type “[url]http://daryl.learnhouston.com[/url]“, what actually gets inserted into the database is something like “[url:d98cJ1pv]http://daryl.learnhouston.com[/url:d98cJ1pv]“. This makes it so that you’re not having to figure out arbitrary nesting, because every opening tag has a corresponding unique end tag; you don’t have to find a beginning tag’s mate by parsing a string recursively, in other words. It’s a really cool idea. Of course, to remove the bbcode_uids from posts as a page is built, you need to store the bbcode_uid associated with a given post, so that it can be stripped out once tags are matched to one another. This is the bbcode_uid field in the posts table. And this field has just been truncated to five characters by the database move. Which means that when phpbb tries to find the bbcode_uid value within a given post, it finds and strips out only the first five characters, which results in three weird characters being appended to bbcode tags and the improper display of bbcode. In every single post and every single signature of your forums, which in my case was nearly 200,000 posts.

The fix is rather daunting to implement. Basically, you have to script something that looks at every single post and every single signature, finds bbcode_uids therein, matches the first five characters to the bbcode_uid field in the posts table (just as a check), and then updates the bbcode_uid for each post to the match found (this is after altering your table to make the bbcode_uid column accommodate eight characters, of course). If you get this wrong, you’ve basically wrecked your whole database, and bbcode for posts in the past will never render correctly. Of course, if you’ve discovered this problem before anybody has posted to your site, then you can alter the database and reimport the data, but this isn’t an option if people have been using the site for a few days before the issue was reported. Luckily, I was able to come up with a pretty simple script to fix the issue. Of course I was terrified to push the start button, so to speak, but push it I did, and it worked.

If you’re having the same issue, you can try my fix at your own risk.

Getting POST data from mod_perl2

I learned today after spinning my wheels a lot over the last few days that mod_perl2 under apache can parse POST data only once. I tried any number of ways to get at the POST data, but it was always empty. Finally, I came across something that mentioned that POST data could be parsed only once. The module I’ve been working on was running at the PerlLogHandler phase, which is after the request had already been parsed. So I could try all I wanted to re-parse the POST content, but I was parsing something that didn’t exist. When I attached the module to the earlier PerlHeaderParserHandler request phase (more info on this stuff here), I had access to the POST data. Of course, my grabbing the POST data in this phase makes it unavailable to other phases, which means that if a PHP file in the directory the module screws with tries to access the $_POST array, it’ll be empty and the script probably won’t behave as expected. This is ok for my purposes, as the module I’m writing is for logging data sent from a source that doesn’t expect a response. The point of my module is to get raw POST data and dump it in a log, so destroying the POST variables doesn’t hurt anything in my case.

The Apache2::Request module is designed to parse and cache request data for the duration of a request. Of course, if you don’t instantiate it until a late phase when the POST has already been destroyed (as I was doing), it’s not very helpful. The module also happens to be a minor pain to install, unlike most perl modules I’ve had to deal with.

New Laptop (and Ubuntu)

I’ve got a training session coming up in a couple of weeks for which I’m required to have a Windows laptop. Of the three laptops I owned at the time of my learning this, one is broken, one is 4 years old and just doesn’t have the power to run the software I’ll need to run, and my main one runs Linux. Sounded to me like it was time to get a new laptop. Doing so would afford me the opportunity to take both a Windows laptop and my Linux box to the training (where I’ll be doing a full day’s training and then, if possible, also a full day’s regular work each day, which necessitates that I have handy the box I can actually be productive on). It would also afford Mleeka an upgraded system after I got back from training. So I went to amazon and dropped a grand on another SONY Vaio and an extra GB of RAM. This is a significant upgrade for me, as I’m currently running half a GB and will be running 2GB when the new RAM arrives.

The new computer got here yesterday, and of course I love it. My inclination was to wipe Vista from it first thing, but then I googled around about dual-booting with Vista, and it turns out to be really easy. Vista lets you partition the disk without even so much as a reboot, which for Windows is shocking. So I shrunk the Windows partition and booted from the disk for Ubuntu Feisty. I had to mess around with gparted for a while to get the partitioning just right, but then I installed Ubuntu, and now I have a dual-boot system. So Mleeka gets her upgrade a little early. I started moving files from the old laptop to the new yesterday afternoon, and by now, I’m completely migrated.

I can’t say enough about how smoothly the Ubuntu install went. I’ve posted a number of times in the past about all the hoops I’ve had to jump through to get various Linuxes (Linuces?) installed on various laptops, but this time, it just plain worked. No wrangling with xorg.conf to get the widescreen or touchpad to work. No more special network card drivers.

The only rough patches I’ve hit so far are that networking does seem a touch flakey. I can get to other boxes on my network without issue, but I’ve had to restart my wireless interface a number of times this morning because I couldn’t hit things outside the network. So I’ll need to work that out. And when I opened up my laptop this morning, I was surprised to see that it had shut down. Turns out that the power management settings default to hibernating when the laptop is closed, and hibernate, intentionally or not, equals hard shutdown on this hardware. That’s easily enough resolved with a change to power management settings.

So in two weeks, I’ll be off to my training with just one laptop that happens to be blazing fast. My next step is to make a virtual machine of another Windows install so that I can run it in vmware during the training and never have to leave the comfort of a bootup into Linux.

phpMyAdmin and Designer mode

phpMyAdmin Designer ViewAnybody who’s done any open source development knows about this nifty tool called phpMyAdmin that lets you manipulate mysql databases through a web interface. For Luddites like me, using the command line interface is usually preferable to using any sort of GUI tool that requires pesky mouse moves and clicks, so I’ve generally deployed phpMyAdmin to be used by others who needed to screw with databases but haven’t been comfortable with (or had access to) the command line.

One of my tasks for today is drawing up some diagrams to show relationships between tables in a big fancy dimensionally modeled database I’m trying to blunder my way through designing. The problem is that I utterly hate all the tools I’ve tried for this on Linux. What I wanted was something that would read in a mysql schema file and populate the basic diagram for me so that all I had left to do was to map out the relationships. There’s a UML-diagram application called Umbrello that does entity-relationship diagrams of the type I need to deliver, and a user has contributed a script that will read in a schema file (though it’s finicky). I don’t like the application itself very much, but it fit my basic requirements. This morning, it started crashing, though, and was unusable. Luckily, I hadn’t done much manipulation of the model in it yet. So the quest for a better tool was back on.

I tried various versions of DBDesigner4 and MySQL Workbench, even going so far as to try running the Windows versions in wine when I experienced problems with the Linux versions, but nothing worked out.

Finally, I decided to take another look at phpMyAdmin to see if it had been enhanced with any sort of modeling capabilities. And it has! In recent versions, you can enable a “Designer” view by uncommenting a couple of lines in the config file and slurping in the tables for the phpmyadmin database in the scripts directory. It’s not a perfect tool by any means, but when I click the “Designer” tab in the app, it shows me a nice DHTML view of the tables that lets me drag them around, specify relationships, toggle to show or hide tables that have no relationships defined, etc. It comes with a nifty little palette and a toggleable sidebar to handle these operations, and it’s really a pretty elegant little piece of work. What’s more, and what makes this really useful for my purposes, is that I can save the frame (phpMyAdmin keeps a navigation frame open on the left) to my local disk, zip it up, and send it to somebody, who can then perform the same DHTML manipulations I was able to perform, making it ideal for sending along a complex schema that can have portions of it disabled for ease of viewing. (To clarify, they can save none of the information back to the database, but there’s some degree of flexibility with respect to how they can control the static view.) And to top if all off, if I want to make changes to tables, I can do it at the command line or right there in phpMyAdmin, and there’s no re-importing of a schema — the Designer view will be up to date the next time I reload it. In Umbrello, it’s my impression that any changes I made after import (e.g. drawing relationships) could not be exported back out in a useful way for porting back to the database, so I would constantly have been updating the schema, importing, and redrawing relationships.

This is a great tool for my purposes, and of course phpMyAdmin’s core features are also very useful in many environments.