30 March 2006

XML based CVs

XML + a XSLT toolchain is an excellent way to maintain your Curriculum Vitae, supporting several languages and several formats with minimum effort. I've been using the XML Résumé Library for my CV for a while, but the lack of recent updates and a slight unsatisfaction with the look of the PDF output made me want to take a peek on what else is out there.

Didn't find much though: the overall feeling I get is that, though not perfect, the XML Résumé Library seems to be the safest bet out there.

The sole exception worth mention is the work done by David Sora for a subject of his masters degree. He designed a XSD schema together with HTML and PDF XSLT stylesheets for CVs based upon the Europass Curriculum Vitae layout. His report is written in portuguese, but an example CV and XSLT are available for english too (in the parent directory of the report). This might not be picked up a community (such as the one behind XML Résumé Library), but the schema is complete and the output looks nice. Also the layout of the Europass CV is comprehensive and professional — as you'd expect from an initiative backed by the European Union. So definitelyinetly something to look upon whenever I need to tweak or drop the XML Résumé Library.

Technorati Tags: , ,

19 March 2006

Detecting the insertion/removal of USB modems with udev

udev has replaced hotplug in the Debian distribution. However not all hotplug's functionality is available (or at least simple to use): with hotplug one could easily write scripts which processed add and remove events, while with udev that has proved to be quite an ordeal.

The device I wanted to detect was the SpeedTouch USB ADSL modem. The first problem I ran into was that "sysfs values are not readable at remove, because the device directory is already gone". The solution is either using an environment variable with the DEVPATH (which didn't work for me), or matching the device with the reduced information available (my only remaining option). Thankfully there was this PRODUCT environment variable which could precisely match the device. This is how the udev rules look:

# /etc/udev/rules.d/z80_speedtch.rules

BUS=="usb", SUBSYSTEM=="usb", SYSFS{idVendor}=="06b9", SYSFS{idProduct}=="4061", ACTION=="add", RUN+="/bin/sh -c '/usr/local/sbin/speedtouch &'"
BUS=="usb", SUBSYSTEM=="usb", ENV{PRODUCT}=="6b9/4061/0", ACTION=="remove", RUN+="/bin/sh -c '/usr/local/sbin/speedtouch &'"

The actions I wanted to take was to start/stop the ppp interface. The second problem is that the above rules matched many add/remove events (driver, and several USB subdevices). To ensure only one add/remove action is taken, a solution is to use the SEQNUM environment variable, whose value is a always increasing integer, and keep track of its value when the device first got inserted. This is how /usr/local/sbin/speedtouch looks like:

#!/bin/sh

RUN=/var/run/speedtouch.seqnum

TIMEOUT=60

# test whether the device is currently added or not
device_added () {
        test -e $RUN && test `cat $RUN` -lt $SEQNUM
}

# wait for the "ADSL line is up" kernel message to appear
wait_for_adsl_up () {
        local TIME

        dmesg -c > /dev/null
        TIME=0
        while ! dmesg | grep -q 'ADSL line is up'
        do
                sleep 1
                TIME=$(($TIME+1))
                test $TIME -ge $TIMEOUT && return 1
        done
}

case $ACTION in
        add)
                # ignore repeated "add" actions
                device_added && exit
                echo $SEQNUM > $RUN

                wait_for_adsl_up

                ifup ppp0
                ;;
        remove)
                # ignore repeated "remove" actions
                device_added || exit
                rm -f $RUN

                ifdown ppp0
                ;;
esac

The script has a bit more magic for waiting for the ADSL line is up, which was taken from the SpeedTouch Linux kernel driver homepage.

Technorati Tags: , ,

07 March 2006

Mix'n'matching

Have you ever did mental math to figure out how to best fit a collection of data into a set of DVDs, trying to squeeze the most into every single DVD? It happens more and more to me, so I wrote a Python script to do it for me.

The algorithm used to efficiently find the largest path combinations below a threshold is inspired in the apriori algorithm for association rule discovery. Since the largest path combination is a superset of smaller combinations, we can start building those starting from single paths, combine those with the initial to make two-item sets while removing all larger than the threshold, then three-item, four-item, and so on; until no larger combination below the threshold can be found.

Here is the script:

#!/usr/bin/env python
# mixnmatch.py - find combination of files/dirs that sum below a given threshold
# -- Jose Fonseca

import os
import os.path
import optparse
import sys

from sets import ImmutableSet as set


def get_size(path):
    if os.path.isdir(path):
        result = 0
        for name in os.listdir(path):
            result += get_size(os.path.join(path, name))
        return result
    else:
        return os.path.getsize(path)


def mix_and_match(limit, items, verbose = False):

    # filter items
    items = [(size, name) for size, name in items if size <= limit]
    # sort them by size
    items.sort(lambda (xsize, xname), (ysize, yname): cmp(xsize, ysize))

    # initialize variables
    added_collections = dict([(set([name]), size) for size, name in items])
    collections = added_collections

    while True:
        if verbose:
            sys.stderr.write("%d\n" % len(collections))

        # find unique combinations of the recent collections 
        new_collections = {}
        for names1, size1 in added_collections.iteritems():
            for size2, name2 in items:
                size3 = size1 + size2
                if size3 > limit:
                    # we can break here as all collections that follow are
                    #  bigger in size due to the sorting above
                    break
                if name2 in names1:
                    continue
                names3 = names1.union(set([name2]))
                if names3 in new_collections:
                    continue
                new_collections[names3] = size3

        if len(new_collections) == 0:
            break

        collections.update(new_collections)
        added_collections = new_collections

    return [(size, names) for names, size in collections.iteritems()]


def main():
    parser = optparse.OptionParser(usage="\n\t%prog [options] path ...")
    parser.add_option(
        '-l', '--limit',
        type="int", dest="limit", default=4700000000,
        help="total size limit")
    parser.add_option(
        '-s', '--show',
        type="int", dest="show", default=10,
        help="number of combinations to show")
    parser.add_option(
        '-v', '--verbose',
        action="store_true", dest="verbose", default=False,
        help="verbose output")
    (options, args) = parser.parse_args(sys.argv[1:])

    limit = options.limit

    items = [(get_size(arg), arg) for arg in args]

    collections = mix_and_match(limit, items, options.verbose)
    collections.sort(lambda (xsize, xnames), (ysize, ynames): -cmp(xsize, ysize))
    if options.show != 0:
        collections = collections[0:options.show]

    for size, names in collections:
        percentage = 100.0*float(size)/float(limit)
        try:
            sys.stdout.write("%10d\t%02.2f%%\t%s\n" % (size, percentage, " ".join(names)))
        except IOError:
            # ignore broken pipe
            pass


if __name__ == '__main__':
    main()

This script has also been posted as a Python Cookbook Recipe.