Monday, December 22, 2008

When other sites discard or refuse your email

We've covered setting up your sendmail to act as a relay for certain computers. Now, we look at another relaying problem.

You like running your own sendmail, you're using it to manage your own email accounts. You could use your ISP's mail server for all outbound messages, but let's say you're not doing that. Now, some third party, maybe another ISP, let's call them "Dogers", decides to silently discard all email coming from IP blocks owned by your ISP unless the sending IP number is one of the mail servers of your ISP. Even if you're running a responsible sendmail on a static IP number, messages sent to "Dogers" just vanish.

The solution is to arrange your sendmail so that, when sending to certain domains, it relays the messages through your ISP's servers. We'll need two more features for this. First, the mailertable function will allow you to use a different mailer for certain addresses. Second, depending on your ISP, you may have to authenticate yourself with the ISP's server before it will relay your messages. This configuration will show how to perform that authentication.

Make sure your contains the following two lines before the first "MAILER" line:
FEATURE(`authinfo',`hash /etc/mail/auth/client-info')dnl

Also, add the following line anywhere in the file:

You will need to have cyrus-sasl installed, and configured for logins. Here is a sample cyrus-sasl configuration invocation:
./configure --prefix=/usr/local --enable-anon --enable-plain \
--enable-login --disable-krb4 --with-mysql \
--with-saslauthd=/var/state/saslauthd --with-openssl=/usr/local/ssl \
--with-plugindir=/usr/local/lib/sasl2/ --enable-cram \
--enable-digest --enable-otp --without-des

OK, now the mailertable entry. Add a line for the dogers domain, telling your sendmail to forward mail for those addresses through your ISP's server:      smtp:smtp.<MY>.<ISP>

Now, to authenticate with the ISP. We told sendmail that our credentials would be stored in /etc/mail/auth/client-info, so we create a file there:
AuthInfo:smtp.<MY>.<ISP> "U:root" "I:wintertoad@<MY>.<ISP>" "P:<password>" "M:LOGIN"

Then, we just have to rehash the mailertable and authentication files with a command like this:
# makemap hash file.db < file

Now, assuming you've rebuilt your after the changes we made to the .mc file above, you can just send a SIGHUP to the sendmail processes, and you should be able to send email to anybody at the domain by relaying those messages through your ISP's mail server.

Sunday, July 13, 2008

A Curious Permissions Problem with ALSA

While I'm waiting for kde4 to reach everyday usability (defined for my purposes as supporting panel auto-hiding), I periodically check out an updated subversion tree of kde4, compile it, and try it out.

There's some pain with switching between kde3 and kde4 and back again, so I try kde4 with a different login, specially created for testing kde4.

There are still bugs in kde4, particularly when you compile subversion trees, and not specific tagged releases. In my latest foray into kde4, several applications did crash, including amarok, the audio media player.

After spending some time in kde4, I logged out and switched back to kde3 with my regular username. Once there, I found that there was no audio from ALSA applications. So, the usual course when this happens is to examine the permissions on the appropriate audio devices. In this case, however, all of the permissions looked fine.

So, what was preventing ALSA applications from running? When I ran strace on an ALSA application, I found that I was getting permissions problems (reported as EPERM on the return from a syscall) on semctl() syscalls. So, the next step is to run ipcs. This shows the SYSV IPC resources currently in use. There, I found two shared memory segments and two semaphores that were owned by the "kde4" username. Since all kde4-owned processes had exited, this indicated that some process had experienced an abnormal exit without releasing some SYSV IPC resources.

ALSA uses such resources when applications want to generate sound, and it is not possible for an unprivileged user to obtain or release the resources of another user. This produced the permission problems and prevented the applications from working correctly with ALSA.

The solution was to become the root user, and use the ipcrm command to release all resources owned by the kde4 user. Once that was done, ALSA applications run as my regular username could, once again, produce sound.

Thursday, May 29, 2008

When distributions patch wrongly

Events of recent weeks have provided another reason one might be inclined to avoid the use of distributions. Let's call this the "debian SSL bug". A patch applied by a well-meaning Debian coder made cryptographic keys generated by numerous applications on that distribution entirely useless. Details can be found here.

The Debian patch affected derived distributions as well, such as Ubuntu. For almost two years, many cryptographic transactions were severely compromised. The biggest problem was that the patch was not correctly passed back to development team of the OpenSSL project. Had it been, they would have pointed out its fatal security implications, and this entire headache would have been avoided.

I always feel uncomfortable when I see distributions applying patches against the original sources. There can be several reasons for these patches.
  1. They may be back-porting selected bugfixes to an earlier version of a library rather than including the latest version of the library with all of its new, and possibly untested features.
  2. They may be modifying a logo or informational string to include something specific to the distribution.
  3. They may be changing some default pathnames or other resources to mesh more well with the idiosyncracies of their own distribution.
  4. They may be changing the appearance of the interface to make it more consistent with other applications.
  5. They may be applying changes that the original maintainers of the package do not consider necessary, but which the distribution maintainers find desirable.
  6. Other...
None of these motivations will usually convince me to apply foreign patches. Your opinion may differ.

Sunday, April 27, 2008

Web browsing behind the great firewall of China

I sometimes spend time in China, and while there, I work remotely to my office and to my home computer. I do somewhat technical work that sometimes requires online research, and it's annoying that a significant fraction of non-Chinese sites are unreachable from China.

The thing to remember is that the firewall isn't there to keep me from working. I'm a Canadian passport holder, and they really don't care what I read while in China. That explains certain curious omissions, such as the fact that TCP port 22 (ssh) is not blocked.

So, here I am, in China, with a Linux laptop, and I'd like to browse the web. Rather than take my chances with the firewall, I proxy the connection through my home computer's apache daemon.

So, first I set up the proxy service on my apache. Make sure you've built the httpd with these configuration options:
--enable-mods-shared="proxy proxy-http proxy-connect"

These settings turn on the proxy service and set it to proxy HTTP traffic. The "proxy-connect" flag allows the httpd to be used as a reflector for SSL connections. If you want to visit a banking website, the data still travels as SSL between your laptop and the home machine, but the home machine just reflects the traffic to the bank without knowing what's in the data stream (the home machine cannot decode that data, if it could, it would count as a man-in-the-middle compromise of the SSL stream).

Next, add some lines to the httpd configuration file. Mine's in /etc/apache/httpd.conf.
LoadModule proxy_module modules/
LoadModule proxy_http_module modules/
LoadModule proxy_connect_module modules/

<IfModule mod_proxy.c>
ProxyRequests On

<Proxy *>
Order deny,allow
Deny from all
Allow from

What this does is to enable proxying, but only on connections from localhost. I don't want my httpd to be a proxy for any random person in the outside world.

Next, I set up my ssh on connections to my home computer. You can either add a switch like this to the invocation:
-L 8080:

or you can add a line to your ~/.ssh/config entry for the connection to the home computer:
LocalForward 8080

Now, you ssh into your home computer.

Finally, you start up firefox, and select the menu item:
Select "Manual proxy configuration", and point your HTTP and SSL proxies at "localhost" with the port number 8080.

That's it, now when you browse websites, the HTTP-related data stream appears simply as a pile of encrypted bits over your ssh connection. The firewall cannot know what websites you're visiting, it can't even tell that you're visiting a website at all.

Important note: this system proxies the HTTP data. That means web pages, frames, images in the page, RSS feeds, and so on. It does not proxy UDP or post-connection traffic, like youtube videos. If your web browser has a plugin that downloads data from an external site, that plugin may not be using your proxy.

If you want to know what data is not passing through your proxy, you can run tcpdump in another window. Something like this:
tcpdump 'host <IPNUM> and not port 22'

where is the IP number of your external interface (not You may have to add a "-i" switch if your laptop has more than one network interface. This command will show you all traffic that is not going over the ssh connection.

Wednesday, April 2, 2008

Fixing sound in Linux Civ:CTP

When my old 64-bit motherboard died, and I replaced it with the DP35DP, one of the surprises I ran into was that the sound was badly broken on "Civilization:Call To Power". All other applications that I tried worked well, any programs using the ALSA interface, as well as a few 32-bit binaries on the OSS interface, like Quake 2 and Heroes 3. However, with Civ:CTP, the sound stuttered and looped horribly. I couldn't use the aoss wrapper because Civ:CTP is statically linked. After a lot of tinkering, I finally came to the conclusion that, for this one application, I had to load the sound module with different parameters.

For every application except Civ:CTP, my snd-hda-intel module is loaded with the parameters
position_fix=1 model=5stack

However, in order to run Civ:CTP, I have to exit all sound applications, remove the snd-hda-intel module, and re-load it with the parameters:
position_fix=3 model=5stack

With this change, the sound in Civ:CTP sounds fine. However, all other applications have poor sound, scratchy and unpleasant to the ear, so I only make this module change just prior to running the game, and re-load the module with the usual parameters immediately afterwards.

Wednesday, March 19, 2008

Experiences compiling X11R7

Compiling and installing X11R7, 7.3, was a bit more rough than the X11 compiles I used to perform. I used the build script supplied with the source packages. When it finished, apparently successfully, there were two problems whose solutions were not obvious.

First of all, OpenGL worked on my NVidia box, but not my ATI laptop.

Second, my Chinese fonts for traditional characters in Emacs looked different, much worse. The simplified Chinese characters still looked fine.

OK, what were the problems, and how did I fix them. First of all, the OpenGL issue. I compiled X11 the same way on both computers, why did OpenGL not work on the ATI laptop? Well, both NVidia and ATI ship closed-source binary blobs with support libraries. The difference is that NVidia supplies its own, while ATI uses the one from So, somehow I was failing to compile and install the OpenGL stuff. This didn't matter for the NVidia case, because it supplied all of the libraries required, but ATI doesn't do that. I had compiled and installed libMesa, so OpenGL should have worked. The OpenGL component is compiled as part of the xorg-server-1.4 package, and its configure script is executed by the build script that came with Aha, but in order to compile OpenGL, you have to provide the configure script with the path of the libMesa source tree. The build script doesn't do that, so OpenGL is not built. The solution is to interrupt the build at the point where the xorg-server-1.4 is about to be built (you can edit the script and put in an 'exit 0' there, for instance), then configure, build, and install the xorg-server-1.4 archive by hand, remembering to tell it where the mesa source tree is located. Once that completes, you can continue the build with the xorg script (I just commented out all entries above the server compile and resumed).

Now, the font problem. My TTF fonts are in /usr/share/fonts, and I verified that the files there were being read when I asked Emacs to display Chinese characters. So, it appeared as if the Chinese TTF fonts were the ones that were looking bad. A bit of research showed that Emacs does not, as of version 22.1.1, use scalable fonts. So I decided that it probably wasn't supposed to be using those TTF fonts. Now, I had kept my old X11R6 tree around in case of issues like this, so a quick comparison of directories showed that there were some Chinese PCF fonts in the old install that I had forgotten to copy to the new location. So, I copied these files into their location in the X11R7 tree, and Emacs was restored to its former behaviour with respect to the displaying of Chinese fonts. The fonts, by the way, are taipei15.pcf, taipei16.pcf, taipei24.pcf, taipei24k.pcf, and taipeil24.pcf.

Tuesday, March 11, 2008

Selective sendmail relaying based on self-signed keys

Back in the early days of the Internet, people trusted one another not to abuse email. Sure, there were accidents. A badly configured mailing list could fill up with traffic as two vacation programs talked to one another, each informing the other that his latest message would not be read until some later date, because the recipient was out of the office.

In those days, you set up your sendmail to relay messages for others. Many people had email addresses that weren't on a full-time connection to the network, they might be on a BBS that did a nightly download of email, or down some Bitnet rabbit hole. Email was relayed from one intermediate post to another, rather than being simply sent directly from the sender to the receiver. A sendmail daemon that relayed messages for others was helpful to the community, everybody pitched in to get everyone's email where it was ultimately intended.

Then came new developments. Canter & Siegel, the September that never ended, and the presence of people who would buy things they saw in an unsolicited email message. Spam started to appear in mailboxes. Suddenly, being a helpful person and relaying messages was no longer beneficial to the community, as commercial email senders used relays to hide the origins of their messages. People started turning off open relays on their boxes as a defensive move.

So, now you've got a domain set up with a sendmail daemon at home, and you're traveling with a laptop. To make this a bit more complicated, let's say your laptop is a work computer, and you send email from its sendmail, but with a different domain than your home computer. Everything's working fine, until you find that the coffee shop in Beijing where you're using your laptop has made it onto a list of spamming IP numbers. Some recipients of your messages may not receive them because their sendmail is set up to refuse messages from computers on these bad IP numbers. You know that your home computer is not on a banned IP number, so it would be nice if you could forward your laptop-generated work-related messages through your home computer. It would be even nicer if people selling generic pharmaceuticals could not do the same thing, otherwise your home computer's IP number will very quickly find itself on one of those banned lists. So, you want to allow relaying from your laptop, but only from your laptop, and do it easily even if you move to another coffee shop.

What you want, then, is a way for your home computer to recognize your laptop, and permit only that computer to relay messages through the home sendmail. This will be done with sendmail's TLS facility. You will create a private certificate authority, one you don't have to pay to sign your keys. You'll then use a signed certificate to verify the identity of the laptop. The following procedure will be performed on the home computer, only at the end of this process will the laptop be involved.

We'll start by creating two directories on your home computer, one for the certificate authority, and the other for the signed certificates. I'll use the directory locations that are found in the default OpenSSL configuration file, so that you don't have to edit too many files.
mkdir /etc/mail/CA /etc/mail/certs /etc/mail/CA/demoCA /etc/mail/CA/demoCA/private

Copy the OpenSSL openssl.cnf file into /etc/mail/CA.

Next, we will create the signing certificate.
$ cd /etc/mail/CA
$ openssl req -new -x509 -keyout demoCA/private/cakey.pem -out demoCA/cacert.pem -days 1000 -config openssl.cnf
You will be prompted for several fields, such as country code, location, name. Here's a sample dialogue:
$ openssl req -new -x509 -keyout demoCA/private/cakey.pem -out demoCA/cacert.pem -days 1000 -config openssl.cnf
Generating a 1024 bit RSA private key
writing new private key to 'demoCA/private/cakey.pem'
Enter PEM pass phrase:
Verifying - Enter PEM pass phrase:
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:Ontario
Locality Name (eg, city) []:Toronto
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example
Organizational Unit Name (eg, section) []:
Common Name (eg, YOUR name) []:Bert Ificate
Email Address []

When prompted, you will have to enter a pass phrase twice. Remember this phrase, you will need it if you ever want to sign certificates with this signing certificate.

This command creates new files: /etc/mail/CA/demoCA/cacert.pem and /etc/mail/CA/demoCA/private/cakey.pem. The file contains encoded information related to a certificate signing authority that will be valid for 1000 days.

Next, you must create the certificate that you will use to validate your laptop. You enter the commands:
$ cd /etc/mail/CA
$ openssl req -nodes -new -x509 -keyout laptopcert.pem -out laptopcert.pem -days 365 -config openssl.cnf

Again, you will have to answer some questions. Here is a sample dialogue:
$ openssl req -nodes -new -x509 -keyout laptopcert.pem -out laptopcert.pem -days 365 -config openssl.cnf
Generating a 1024 bit RSA private key
writing new private key to 'laptopcert.pem'
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
Country Name (2 letter code) [AU]:CA
State or Province Name (full name) [Some-State]:Alberta
Locality Name (eg, city) []:Calgary
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Example
Organizational Unit Name (eg, section) []:
Common Name (eg, YOUR name) []:Rhoda Warrior
Email Address []

Now, you have a certificate for your laptop, but it hasn't yet been signed. You use the signing certificate to vouch for the laptop certificate. First, we have to set up a bit more information for the signing process:
$ mkdir /etc/mail/CA/demoCA/newcerts
$ touch /etc/mail/CA/demoCA/index.txt
$ echo 01 > /etc/mail/CA/demoCA/serial
You'll only have to do this the first time you set up a signing authority.

Now, we issue two commands to sign the laptop certificate:
$ openssl x509 -x509toreq -in laptopcert.pem -signkey laptopcert.pem -out tmp.pem
$ /usr/local/ssl/bin/openssl ca -config openssl.cnf -policy policy_anything -out signed-laptopcert.pem -infiles tmp.pem
Once again, there will be a brief dialogue when the second command is run, something like this:
$ openssl ca -config openssl.cnf -policy policy_anything -out signed-laptopcert.pem -infiles tmp.pem
Using configuration from openssl.cnf
Enter pass phrase for ./demoCA/private/cakey.pem:
Check that the request matches the signature
Signature ok
Certificate Details:
Serial Number: 1 (0x1)
Not Before: Mar 12 00:46:43 2008 GMT
Not After : Mar 12 00:46:43 2009 GMT
countryName = CA
stateOrProvinceName = Alberta
localityName = Calgary
organizationName = Example
commonName = Rhoda Warrior
emailAddress =
X509v3 extensions:
X509v3 Basic Constraints:
Netscape Comment:
OpenSSL Generated Certificate
X509v3 Subject Key Identifier:
X509v3 Authority Key Identifier:

Certificate is to be certified until Mar 12 00:46:43 2009 GMT (365 days)
Sign the certificate? [y/n]:y

1 out of 1 certificate requests certified, commit? [y/n]y
Write out database with 1 new entries
Data Base Updated

Now, it's time to tell the home machine's sendmail that it should relay messages received from this key. Add a line to the /etc/mail/access.src file that looks like this:
CertIssuer:/C=CA/ST=Ontario/L=Toronto/O=Example/CN=Bert+20Ificate/emailAd RELAY

You'll have to make that file readable by sendmail:
makemap hash access.db < access.src

And now we have to make sure that the home machine's sendmail knows where to find its certificates and access file. Build a new using a something like this:
VERSIONID(` for version 01')
FEATURE(`nouucp', `reject')
FEATURE(`virtusertable', `hash /etc/sendmail/virtusertable')dnl
FEATURE(`genericstable', `hash /etc/sendmail/genericstable')dnl
FEATURE(`local_procmail', `/usr/local/bin/procmail')
FEATURE(`access_db', `hash -T<TMPF> /etc/mail/access')
define(`CERT_DIR', `MAIL_SETTINGS_DIR`'certs')dnl
define(`confCACERT_PATH', `CERT_DIR')dnl
define(`confCACERT', `CERT_DIR/CAcert.pem')dnl
define(`confSERVER_CERT', `CERT_DIR/MYcert.pem')dnl
define(`confSERVER_KEY', `CERT_DIR/MYkey.pem')dnl
define(`confCLIENT_CERT', `CERT_DIR/MYcert.pem')dnl
define(`confCLIENT_KEY', `CERT_DIR/MYkey.pem')dnl

Now, we move some things around a bit. We copy the signing certificate and laptop signed certificate like this:
$ cd /etc/mail/CA
$ /bin/cp signed-laptopcert.pem /etc/mail/certs
$ /bin/cp demoCA/cacert.pem /etc/mail/certs/CAcert.pem
$ cd /etc/mail/certs
$ ln -s signed-laptopcert.pem `openssl x509 -noout -hash < signed-laptopcert.pem`.0

The three files, demoCA/cacert.pem, laptopcert.pem and signed-laptopcert.pem get copied onto the laptop, in its /etc/mail/certs directory. Now, you must tell the laptop's sendmail that these are its certificates. This is done by building (on the laptop) the file from a file that looks roughly like this:
VERSIONID(`$Id:,v 8.1 1999/09/24 22:48:05 gshapiro Exp $')
define(`confCACERT_PATH', `/etc/mail/certs/')
define(`confCACERT', `/etc/mail/certs/cacert.pem')
define(`confCLIENT_CERT', `/etc/mail/certs/laptopcert.pem')
define(`confCLIENT_KEY', `/etc/mail/certs/signed-laptopcert.pem')
define(`confSERVER_CERT', `/etc/mail/certs/laptopcert.pem')
define(`confSERVER_KEY', `/etc/mail/certs/signed-laptopcert.pem')
FEATURE(`local_procmail', `/usr/local/bin/procmail')

Finally, you'll have to decide when you want to relay through the home computer. You really have two choices. You could set it up so that all messages are always relayed through the home computer, by setting a smart relay in your, or you could relay them explicitly. There are other places that identify the technique for setting up a smart relay, so I'll just describe the second, on-demand version.

If you are trying to send email from your laptop to the user, but want to relay it through your home computer at, you would send the message to this email address:

And there you go, on-demand secure relaying of messages through your home computer.

Thursday, February 28, 2008

Why don't I get spam?

I have an anti-spam trick. It won't work for most people, but there might be some people out there who are inclined to take advantage of it. For the rest, this might be educational.

The trick that I use depends on the fact that I have my own domain. That means I can run sendmail on my computer, and I can create email addresses quickly and easily. I will use the domains,, and for this document, as recommended in RFC 2606.

The basic idea is this: instead of having one email address, I have dozens. I create a new email address for every person with whom I exchange messages, as well as addresses for websites and companies when necessary. If an email address is accidentally revealed, or if one of the companies decides to start sending annoying amounts of unsolicited mail, I simply expire the email address and, if desired, contact the sending party to tell them about the new address. I don't have to contact all of my friends whenever I turn off one address, only the one person who uses that address to talk to me.

OK, how is this implemented? There are two things I have to do. First, I need my sendmail to accept the messages for the active addresses, and send them all to me. Second, I have to ensure that my outbound email has the correct Reply-To: address for the particular recipient of the message.

If you're familiar with sendmail, you can probably guess how I do the first thing. I set up a virtual user table. Here's the file used to make this work:

VERSIONID(` for version 01')
FEATURE(`nouucp', `reject')
FEATURE(`virtusertable', `hash /etc/sendmail/virtusertable')dnl
FEATURE(`genericstable', `hash /etc/sendmail/genericstable')dnl
FEATURE(`local_procmail', `/usr/local/bin/procmail')
FEATURE(`access_db', `hash -T<TMPF> /etc/mail/access')
define(`CERT_DIR', `MAIL_SETTINGS_DIR`'certs')dnl
define(`confCACERT_PATH', `CERT_DIR')dnl
define(`confCACERT', `CERT_DIR/CAcert.pem')dnl
define(`confSERVER_CERT', `CERT_DIR/MYcert.pem')dnl
define(`confSERVER_KEY', `CERT_DIR/MYkey.pem')dnl
define(`confCLIENT_CERT', `CERT_DIR/MYcert.pem')dnl
define(`confCLIENT_KEY', `CERT_DIR/MYkey.pem')dnl
Then, I create a file called /etc/mail/virtusertable.src. It contains entries similar to this:                      error:nouser Spammers found this address myself myself myself myself myself myself myself myself myself error:nouser Spammers found this address myself

The addresses I create for regular correspondance are just successive numbers, plus an unpredictable sequence of two characters to avoid dictionary attacks.

Now, recall that sendmail doesn't read the virtusertable.src file, it reads another file called virtusertable.db. I've got a little Makefile in /etc/mail that I use to keep things up to date:
all : genericstable.db virtusertable.db mailertable.db aliases.db access.db

%.db : %.src
makemap hash $* < $<

aliases.db : aliases

hup : all
killall -HUP sendmail

Now, I can change the virtusertable file, and when it looks correct, issue (as root) the command:
make -C /etc/mail hup
This will update the appropriate database file, and send a SIGHUP to sendmail, telling that program to reload its databases.

So, that's the receiving side. How about sending? There may be a way to configure sendmail to rewrite the outbound addresses according to a database of recipients, but I haven't figured one out. Instead, I have written a bit of code for my email client, which is rmail mode in Emacs. Here are the relevant bits of Emacs Lisp:
(setq user-mail-address "")
(setq mail-specify-envelope-from t)

(setq outbound-address-alist
("" "")
("" "")
("" "")
("" "")
(nil "")
(setq full-name "Winter Toad")

;; a function to parse out the header and send email as if from
;; different usernames. That way, I can obsolete a username if it
;; gets spam.
(add-hook 'mail-send-hook
'(lambda ()
(narrow-to-region 1 (mail-header-end))
(expand-mail-aliases 1 (mail-header-end))
(re-search-forward "^To: ")
;; parse out the recipient address
(let (recipient from-whom)
((looking-at "\\([^ \\t]*\\)$")
(setq recipient (match-string 1)))
((looking-at "[^<]*<\\([^>]*\\)>$")
(setq recipient (match-string 1))))
(setq from-whom (or (cadr (assoc recipient outbound-address-alist))
(cadr (assoc nil outbound-address-alist))))
(insert "From: " full-name " <" from-whom ">")

(re-search-forward "^Reply-to: ")
(let ((namestart (point-marker)))
(kill-region namestart (point-marker))
(insert from-whom)))

(narrow-to-region 1 (1+ (buffer-size)))))

What this does is to insert a hook into the mail system when I hit send. A bit of elisp locates the email address in the "To:" field, and tries to match that string to one of the names in the 'outbound-address-alist'. If it finds a match, it inserts the corresponding data into the "Reply-to:" field. If no match is found, or if there are multiple recipients, it uses the default fallback address.

It also sets the sender address to, which means that automated replies, such as sendmail daemon warnings and errors, will be delivered to that address. It should be redirected in the virtusertable to some appropriate address so that you can be notified of problems at the recipient's end (though many systems no longer generate bounce messages, because of spam abuse).

Anyway, with all this, I get really no spam. Every few months I may get one message on one of my email addresses, typically one that I used for a forum post or to send a bug report or patch to a mailing list. I retire the address, set up a new one, and never get spam at that address again.

Some time later I'll describe the cryptographic certificates in the mail configuration, and how they allow secure relaying.

Tuesday, February 26, 2008

Installing in non-standard places

I mentioned earlier the possibility of choosing an install prefix like /usr/local/samba, which installs the Samba libraries in a directory that may not commonly exist on distribution-managed machines. One possible effect of this is that you may turn up bugs in configuration and compilation scripts of other packages.

A configure script for another package may accept arguments related to the location of Samba libraries and header files, but compiling the package with these options set might not work. This isn't very surprising, it's a compilation option that is probably rarely used, so bit rot has a tendency to set in. A change somewhere that accidentally breaks the compilation when Samba is installed in an unusual place might not be noticed for some time. By putting Samba in its own directory, you are setting yourself up to test a valid, but rarely exercised option. You may find yourself submitting bug reports and patches to the package maintainers.

As I've said before, maintaining your box without a package manager and distribution is not easy. It's quite a bit more work, but it does force you to understand more about how the system is set up and what it's doing. For people who like the extra control and understanding this provides, this is a useful technique.

Sunday, February 24, 2008

Pharyngula readers in Ottawa

PZ over at Pharyngula reports that readers of his blog are meeting up in various places. Well, if there are any people in Ottawa who are interested in meeting, we can try to set it up here in the comments.

Any place I can get to by bus is fine with me, maybe a weekend lunch time? Possibilities would be
  • Lone Star at Baseline and Fisher
  • Sushi Kan at Baseline and Merivale
  • Some place in Chinatown
Or suggestions from somebody else, I'm not very familiar with the spots to eat in the city, places where a group can sit, eat, and talk for a while.

Thursday, February 21, 2008

A Followup On Cryptographic Mounts, The Bad News

Previously, I discussed cryptographic mounts to hold sensitive data. It's worth pointing out an article that is making the rounds today by 9 authors from Princeton, in which the researchers describe an attack on cryptographic techniques, including the one I've described.

The technique relies on the fact that modern memory can retain its information for several minutes after the computer stops sending it refresh signals. What this means is that a person with physical access to the computer can pull the power connector from the computer and then remove the memory chips, insert them in another computer, and read the cryptographic keys out of the memory. I don't know of a good way to avoid this attack. If the cryptographic volumes are mounted when the computer falls into the hands of the attacker, the data will be, in theory, recoverable.

So, what can be done to prevent the key from being resident in the computer's memory at the instant that the attacker unplugs it? The key has to be available to the operating system so that it can read and write that data in normal operation. Sure, you could get specially modified hardware that deliberately overwrites the main memory from batteries when the power connector is removed, but maybe there's a way to store 128 bits somewhere other than in main memory?

A cache line on a modern CPU is 64 bytes, big enough to hold two 128-bit keys. Could the operating system subvert the hardware's L1 caching mechanism sufficiently to pin a value in the cache and remove it from L2 and main memory? This attack won't recover data from the L1 cache, so if that's the only place the key is kept, maybe that would be enough. You sacrifice a cache line, but maybe it's worth it?

How about the TLB? That's another part of the CPU that holds data, and that one is explicitly designed to interact with the operating system. Could we find a way to store 128 bits in parts of the TLB, and then deliberately avoid overwriting them? Can the operating system read those numbers back out of the TLB?

Are there any registers that could be used? Probably not on 32-bits, there aren't many registers there, and on 64-bits you'd probably have to use a special-purpose compiler to avoid these registers being touched by a context switch, and avoid them being saved to memory when an interrupt handler runs.

What if you have fifteen keys, all of 128 bits? Well, I believe we could handle that if we had 256 bits of volatile storage space. The first 128 bits of volatile space holds an XOR key, that decodes all of the fifteen keys. The second 128 bits of volatile space holds the decoded key in active use.

Those are my thoughts, anyway.

Wednesday, February 20, 2008

Choosing an install prefix

As noted in this posting, you generally will have to choose an install prefix for software that you are compiling yourself. Most packages you encounter will be configured to install under /usr/local, though some will be configured for /usr.

The first thing you'll want to do is to see if you already have an older version of the software installed anywhere. If the software was previously installed under /usr/local, and you install the new package under /usr, not only will you needlessly consume disk space, but the version that is run will depend on the setting of your PATH environment variable. A user may report that he can't use a certain feature in the new version, and it may take you a while to notice that his environment variable differs from yours, and that he's still running the old software. So, find the name of an executable that you expect will be installed. For example, if you're installing the binutils software, you will expect that the ld binary should be installed somewhere. Next, type the command:
which ld
to see where it is currently installed. If you see it in "/usr/bin/ld", then you'll probably want to use a prefix of "/usr", so that your new versions install over top of the old ones. If, on the other hand, it's in "/usr/local/bin/ld", you'll want a prefix of "/usr/local".

Sometimes a package installs only one or a few binaries. You may decide to install this into its own directory. For example, I install firefox into the prefix /usr/local/firefox, SBCL into the prefix /usr/local/sbcl, and the apache httpd into /usr/local/apache2. These get their own directories because, while they may install a very small number of executables, they come with a large set of ancillary files. Rather than installing over top of the old directory, I move the old directory to a new location, say "/usr/local/sbcl.old", and then install and test the new version. If the new version doesn't work properly, I can revert to the old one by deleting the new install and renaming the ".old" directory. Alternatively, I can compare the two installations, the previously working one against the new one, and see if there are any obvious differences that could account for problems.

Of course, you probably won't be able to type the command firefox and expect it to run if it's installed in /usr/local/firefox/bin/. You will either want to add that directory to the PATH variable, or, more conveniently, put a symbolic link to the appropriate executable from a directory that is in your PATH. This command:
ln -s /usr/local/firefox/bin/firefox /usr/X11/bin/firefox
puts the firefox executable into your PATH, piggy-backing on the /usr/X11/bin entry that is probably there already. Note, however, that if you re-install X11 (we'll get to that in another posting), you might destroy this symbolic link, and you'll have to re-create it then.

So, you really have a couple of choices. Put the program into a standard place, like /usr or /usr/local (and if upgrading try to install over top of the old version by using the same prefix that was used then), or installing the software in its own dedicated directory, like /usr/local/firefox or /usr/local/sbcl.

Now, when you set the prefix in an autoconf configure script, it also sets a number of derived values which can be separately overridden. Configuration files are, by default, put in <prefix>/etc, libraries in <prefix>/lib, headers in <prefix>/include, man pages in <prefix>/share/man (sometimes omitting the 'share' component), log files in <prefix&gt/var/log, and so on. The configure program lets you override these defaults separately, so that you can put configuration files into, say, /etc/http with the option "--sysconfdir=/etc/http", and so on. Think carefully about whether you want these additional directories to keep their defaults. You probably don't want your X-server log to be in /usr/X11/var/log, nobody will know where to look for it.

Compiling and installing by hand

If you're not using a package manager, or if you are, but there is no package available for a piece of software you'd like to install, you'll find yourself compiling the software by hand. Generally, you start by locating the official web page of the software, downloading an appropriate version of the source code, and extracting the tar file to a directory somewhere.

At this point in the process, you are not doing anything as the root user. You'll become root much later in this process.

The next thing you'll do is look in the top level of the extracted directory for promising looking files, like README, INSTALL, or Makefile. It is likely that you will see an executable script called "configure". It's always a good idea to start by looking at the README and INSTALL files, if present. They may be in the toplevel directory, or in a documentation directory, which will often have a name like "doc", "docs", or "documentation", possibly with different capitalizations.

If The Package Came With A Makefile

If there's a Makefile in the toplevel, that's usually because the software package is fairly small. You will want to look over the Makefile to ensure that it is correct for your intended installation. The most important things to look for are the installation directory and any optional features that might have to be turned on by editing the Makefile. If you can't find the installation directory, type the command:
make -n install

This will ask "make" to print out the sequence of commands that it will be using to install the package. Since you haven't compiled anything yet, it will start with the sequence of commands required to compile your software, so look for the installation commands to occur near the end of the output generated by this command.

If your package came with a Makefile, you will now modify the Makefile if necessary, perhaps changing the installation directory of the product. You should do this before compiling it, because sometimes character strings holding the pathnames of configuration files are inserted into the compiled binary, so changing the installation target after compiling may result in an installation that doesn't work correctly. Editing the Makefile will usually not force a recompilation of the objects under its control, that is the Makefile is not, by default, considered a dependency for the targets in the file.

After this, you will, still as your non-root user, compile the package. This is usually done by simply entering the command make. If errors are encountered during the compile, you'll have to figure out what happened and how to fix it. The most common causes of errors are:
  • missing include files - you might have to add a "-I<directory>" to the CFLAGS, CXXFLAGS, or CPPFLAGS variables in your Makefile.
  • missing libraries - you might have to add a "-L<directory>" to the LDFLAGS variable in your Makefile.
  • bad version - the compilation may depend on a library you have on your machine, but the version you have may not be compatible with the software package. You might have to download a different version of that library and install it before you can continue with the software package.
  • apparent code errors - the compiler may generate errors related to missing variables, bad function declarations, or syntax errors. Resist the urge to correct these immediately, and try to understand why you are seeing these errors. Remember, this package probably compiled for somebody before they released it, why doesn't it work for you? Is it that your compiler is a different version, and flags as errors things that used to be warnings? Is the Makefile configured for the wrong architecture or platform? Something else?
Once you get a clean compile, you're almost ready for the install. I usually prefer to run the command
make -n install | less
once and read through the output, just to make sure that the install isn't going to do something weird. Look for things like configuration files going into /usr/etc, which might not be what you expect, or binaries going into /bin (you should try to keep in that directory only those executables that are necessary to get the computer to boot through its startup scripts up to the point where the network starts up).

At this point, move down to the section of the text called "Installing The Software".

You Have A "" Script, But No "configure" Script

If you have a "" script, but no "configure" script, you'll have to generate the configure script. If there is an executable in this directory with a name like "", run it. This should be sufficient to set up the configure script. If you don't have an autogen script, you should run the commands automake then autoconf. This will often generate warnings, but unless the configure script you generate doesn't run, you can ignore those. So, now you have a configure script, you continue to the next section.

You Have A "configure" Script

If you generated the configure script yourself, you know that it's an autoconf configure script. Sometimes, though, software is produced that has a completely different script that happens to be called "configure". This can be confusing if it doesn't recognize the switch "--help". Start by typing:
./configure --help | less
and look at the output. If it produces a list of options that are available to you, review them carefully and see if there are any optional behaviours that you would like to turn on, or unwanted options that you want to remove (possibly you don't have library support for these, and don't need them). If, instead, the configure script appears to run and do things, you don't have an autoconf configure script, go back and look at the documentation again to see how to use their particular configuration script.

There are a few things to look at in the options you get from "configure". One of them is the prefix location, and choosing that properly can require some care, which is discussed here. For now, let's assume that you've chosen a set of options that look suitable. You re-run the configure script with those options, and without the "--help" option. It will do some things, it may take a considerable amount of time to run. Eventually, the script should exit, sometimes generating a list of all options and whether or not they are active. Examine this list if present, there might be an option that you want to enable that has been turned off because the configure script failed to find a particular library, in which case you'll have figure out why that option was disabled and figure out how to get it working. When you're satisfied with the compilation options, type "make". If an error is encountered, see the possibilities mentioned in the earlier section referring to building from a Makefile. If you succeed in compiling the software package, go to next section, "Installing The Software".

Installing The Software

Now, you can become the root user. Change directory to the location where you compiled the binary, and run
make install
If the thing you're installing has any shared objects (libraries, usually with names that end in ".so", possibly followed by more dots and numerals), you should type
to make sure that the dynamic linker knows where to find the libraries you've just installed.

Many packages these days produce a pkg-config file. This is usually a filename that ends in ".pc", and is installed in a directory like ".../lib/pkgconfig/". The pkg-config application often looks for these files when "configure" is being run, but it has a fairly definite idea of where to look. If your .pc file was installed into a directory where pkg-config doesn't normally look, you'll have to find some way to make this file visible to that program. There are three ways you can handle this:
  • Add the appropriate directory to the system-wide environment variable PKG_CONFIG_PATH. Usually this means editing /etc/profile. You likely want it set at least to "/usr/lib/pkgconfig:/usr/local/lib/pkgconfig:/usr/X11/lib/pkgconfig", but you may want to add more search directories to it, if you expect many packages to be installed in the same prefix.
  • Copy the .pc file into a directory that pkg-config searches. This is unwise, you may install another version of the software some time later, and unless you remember this step your .pc file will still be the old one, causing much aggravation as "configure" insists you still have a version of the package that you know you just replaced.
  • Put a symbolic link to the file from a directory that is searched by pkg-config. Do this if you've got only one or two .pc files in this prefix, and don't expect to put in more.
Test your newly-installed software. It's best to find problems now, when you've just finished installing it and remember what you did, than two weeks from now and have to go through the whole thing again just to figure out how it's set up.

Two more hints: "configure" writes its command line into a comment near the top of the file "config.log". If you need to remember how you last ran "configure", you will find the options you used there.

If you have a particularly detailed set of configure options, you might want to record them in a directory somewhere for future reference, both to see quickly what options you enabled when you compiled the software and to re-use the command the next time you recompile it after downloading a new version.

Tuesday, February 19, 2008

Making blogged source code readable

Just a quick note, I've reformatted my source code examples using

Monday, February 18, 2008

Keeping sensitive data on the crypto disks

Previously, I described how to create one or more crytpographic partitions. The data stored on those partitions is not retrievable without the 32-digit hexadecimal key that protects it, the key being constructed from a passphrase input by the user. It may seem that this is sufficient to protect sensitive data, making sure simply to create and edit your files only in that partition. However, there are some subtle details that have to be kept in mind.

Information stored on an unencrypted ext2 or ext3 partition has an unknown persistence. A file that was stored there, and later deleted, may be partially or fully recoverable at some time in the future. To be sure of the confidentiality of your data, you have to make sure that it has never been stored to an unencrypted partition.

If you start up your favourite text editor, telling it to create a new file in some place, let's call it /crypto/sensitive.txt, and then start typing, you may expect that the data never lands on an unencrypted partition. However, there are at least four things to be careful of:
  1. The editor may store information in your home directory, which may not be on the encrypted partition. It might store some of the file contents there, or it might store file metadata. Your editor may keep a table of filenames recently visited in /home, with information about the line number last visited. Your editor might be configured to store crash-recovery autosave files in a directory under your /home directory.
  2. The editor may sometimes store the contents of a working buffer to a file in /tmp.
  3. The computer may come under memory pressure, resulting in some of your data being sent to the swap device.
  4. Your backups may not be as well protected as the files on the cryptographic disk.
The first two points are probably best addressed by ensuring that all of the directories writable by the unprivileged user are on cryptographic partitions. If you only have write permission to the crypto drives, you won't store any files in plaintext. Note, however, that you typically need /tmp to exist and be writable during the bootup of your system, so that partition can't be protected with a passphrase if you care about the system successfully performing an unattended reboot.

So, what do we do about /tmp? Well, one simple solution is an overmount. While you normally mount a partition onto an empty directory, it is legal to mount onto a directory that is not empty. The files that were present in that directory are mostly inaccessible after that (a process with access to file descriptors that it opened before the mount will still be able to operate on those files, but they will be invisible to new open operations by pathname).

We're assuming you have at least one cryptographic partition. So, create a directory on that partition, let's say /crypto/tmp. After you have formatted and mounted your cryptographic partition, run this command. You only have to do this once, the first time you set up cryptographic disks.
mkdir --mode=01777 /crypto/tmp

Now, you can add the following command to the end of the script in the previous post, the script that mounts your formatted disks:
mount --bind /crypto/tmp /tmp

After you've done this, the system will still boot up as usual, using its unencrypted /tmp partition. Then, the root user can run the script from the previous post, now modified to have this extra mount line on the end of it. After entering the passphrase the script will do its work and exit, at which time your /tmp partition will have been replaced with the one in /crypto. Note that if your system starts up in X, with a graphical login screen, you will have to restart it after you have overmounted /tmp, otherwise you will find that X programs fail to work at all. I usually restart X by issuing a simple "killall X" command, and letting the xdm or gdm program start it back up again. This is a lot of trouble, but all manner of things can be stored on your /tmp disk. Firefox will store downloaded files such as PDFs there when there is a helper application ready to use them.

That leaves us with swap. Encrypting the swap space is actually very easy:

# Encrypt the swap partition
hashed=`dd if=/dev/urandom bs=1 count=64 | md5sum | awk ' { print $1 } '`
dmsetup create SWP <<DONE
0 `blockdev --getsize /dev/hda6` crypt aes-plain $hashed 0 /dev/hda6 0
mkswap /dev/mapper/SWP
swapon /dev/mapper/SWP

This can run unattended during the bootup. It creates a random cryptographic key using /dev/urandom, a device especially designed to produce true random numbers even during a system bootup sequence. This random key is used to create an encrypted interface to /dev/hda6. It is formatted as a swap partition, and then enabled. A new key will be generated each time the system boots, so nothing in swap space will survive a reboot. Note that there do exist suspend-to-disk procedures for Linux that store a memory image on the swap partition. If you intend to use such a suspend system, you will have to ensure that it does not attempt to write to the cryptographic swap partition, or you'll have to defer mounting the swap partition until the root user can enter a specific passphrase, thereby allowing you to preserve the contents across a reboot. If you're supplying a passphrase to handle encryption on the swap space, you should not run mkswap, except the first time you set up the partition (think of mkswap as being a reformat).

The question of how to protect your backup copies of sensitive files is entirely dependent on what system you use for backups. You may be able to pipe your backups through the des binary, or you may be able to store the backups on encrypted filesystems, but there are too many variations for me to offer much advice here. The security of your backups is not something that can be ignored, as has been made all to obvious with the various data disclosure scares that occur with alarming regularity when shipments of tapes or CDs fail to arrive at their destinations.


See my followup article for a warning about a vulnerability in this technique.

Sunday, February 17, 2008

Cryptographic mounts

Some of the data on my computers is stuff that I'd rather not let into the hands of a random stranger. Work-related files, proprietary data or source code, banking information, or other sensitive files. A laptop can go missing, an entire desktop computer can be carried away. It would be nice if the sensitive data were inaccessible in that event.

This leads us to cryptographic mounts. Partitions whose contents cannot be read without the knowledge of a secret that is not stored in the computer. I use a passphrase, but if you are the kind of person who memorizes 32 digit hexadecimal numbers, you can skip the passphrase. The appropriate features to enable in the kernel, either as modules or compiled directly in, are MD (the same subsystem that controls RAID) and two features in that subsystem, BLK_DEV_MD, and DM_CRYPT. You also need a cryptographic algorithm available. I use AES encryption on my partitions, but there are many others available. I have activated the CRYPTO_AES module, plus the appropriate architecture specific module, CRYPTO_AES_X86_64 for my desktop machine and CRYPTO_AES_586 for my laptop.

So, let's say you have one or more blank partitions that you'd like to set up as a cryptographic partitions, all with the same passphrase. You start with this script:
#! /bin/sh



echo -n "Enter the passphrase: "
read -s oneline


{ hashed=`md5sum | awk ' { print $1 } '` ; }&lt;&lt;DONE

dmsetup create $mapname1 <<DONE
0 `blockdev --getsize $partition` crypt aes-plain $hashed 0 $partition 0
dmsetup create $mapname2 <<ONE
0 `blockdev --getsize $partition2` crypt aes-plain $hashed 0 $partition2 0

What this script does is to prompt the user for a passphrase, without echoing it to the screen. Once the passphrase is entered, it is converted to a 32 character hexadecimal string with the MD5 program. I use a here document, marked with the << characters, because that way the hexadecimal string does not appear in the process status list. Simply using echo risks having the secret visible to any user who types ps at the correct moment. Then, the dmsetup program creates the cryptographic mapping, using the hex sequence as the cryptographic key.

You will have to change the values of the $partition and $partition2 variables to correspond to those on your system. Note that volume labels are unavailable, because the system can't read the label off a cryptographic partition before the passphrase has been supplied.

Run this script, entering the passphrase. It's important that you do this through the script, and not manually at the command line, because later you'll modify the script to mount your cryptographic partitions, and you want to ensure that exactly the same code read your passphrase when you created the partitions as will read your passphrase when you try to mount the partitions after a reboot some time in the future.

When the script exits, you will have two new objects appearing in the /dev/mapper directory. In this case, they are /dev/mapper/Crypto1 and /dev/mapper/Crypto2. So, in this example, /dev/sda6 is the encrypted volume, and /dev/mapper/Crypto1 is the decrypted version of the same volume. You do all of your work on /dev/mapper/Crypto1. You format and mount that device, never /dev/sda6.

This command will create an ext3 filesystem with 0 bytes reserved for the superuser.
/sbin/mke2fs -j -m 0 /dev/mapper/Crypto1

Now, you can mount /dev/mapper/Crypto1 onto a mount point, and start copying files there as usual. Until you remove the cryptographic mapping, the data is available as a normal mounted partition. So, we now append some code to the script above to allow the partitions to be mounted by the root user after a reboot. Take the script above and add the following lines to the bottom:
/sbin/e2fsck -y /dev/mapper/$mapname1 || \
{ dmsetup remove $mapname1 ; echo "" ; echo "fsck failed"; exit 1; }

/sbin/e2fsck -y /dev/mapper/$mapname2 || \
{ dmsetup remove $mapname1; dmsetup remove $mapname2 ;\
echo "" ; echo "fsck failed"; exit 1; }

mount -onodiratime /dev/mapper/$mapname1 $mtpt1 || \
{ dmsetup remove $mapname1 ; dmsetup remove $mapname2 ; \
echo "" ; echo "Failed" ; exit 1 ; }

mount -onodiratime /dev/mapper/$mapname1 $mtpt2 || \
{ umount $mtpt ; \
dmsetup remove $mapname1 ; dmsetup remove $mapname2 ; \
echo "" ; echo "Failed" ; exit 1 ; }
echo ""

This runs fsck on the partitions, if necessary (remember, fstab can't fsck these partitions because it doesn't know the passphrase). Note that if you entered the wrong passphrase, you'll find out at this point, when e2fsck fails to identify the partition as being an ext2 or ext3 partition.

It then manually mounts the cryptographic partitions onto the mountpoints in $mtpt1 and $mtpt2. In the event of a mount failure, it unmounts everything and removes the cryptographic mappings.

The next time the computer is rebooted, the root user will have to run this script and enter the correct passphrase before the data on those drives is readable. If somebody else obtains your laptop, any mounted cryptographic partitions will be unavailable if the computer is rebooted, or the drive removed from the laptop and inserted into another machine.

This is only half the story. In a later post I'll describe the care you have to take to make sure your sensitive data does not wind up as readable plaintext somewhere on your filesystem.

Why do I have so many hard drives?

There are five hard drives in my main computer. There is no RAID setup. Why?

Hard drives fail. I've had the drive holding my root partition fail more than once. When that happens, I used to restore from backup. I would make a backup tape at least once a week, but a badly timed disk failure could still result in the loss of a lot of work.

My solution to this has been to buy my hard drives in matched pairs. I partition them equally, format them the same way, and install them both in the computer. One of them is the live disk, the other is the spare. The spare is kept unmounted and spun down. Every night around 3:00 AM, a cron job spins up the spares drives. Then, one partition at a time is fsck-ed, mounted, and copied to. The shell script uses rdist to synchronize the contents of the two partitions. Finally, I take special care to make the backup drive bootable. I use the LILO boot loader, so, when the root partition is mounted under /mnt/backup, the script executes the command:
/sbin/lilo -r /mnt/backup -b /dev/sdc

which, on my system, writes the LILO boot magic to the backup boot drive, which appears as /dev/sdc when it is the spare in my system. My lilo.conf file, on both the live system and the spare, refer to the boot drive as being /dev/sda, but this '-b' switch overrides that, so that the information is written to the boot block of the current /dev/sdc, but is written so that is appropriate for booting the device at /dev/sda (which it will appear to be should my live boot drive fail and be removed from the system).

Next, I use volume labels to mount my partitions. You can't have duplicate labels in the system, so my spare drive has labels with the suffix "_bak" applied. That means that the /etc/fstab file suitable for the live drive would not work if the spare were booted with that fstab. To solve this problem, the copying script runs this command after it finishes copying the files in /etc:
sed -e 's|LABEL=\([^ \t]*\)\([ \t]\)|LABEL=\1_bak\2|' /etc/fstab > /mnt/backup/etc/fstab

which has the effect of renaming the labels in the fstab to their versions with the _bak suffix, so they match the volume partitions on the spare hard drive.

OK, that sounds like a lot of work, why do I do it? What does it buy me?

First of all, it gives me automatic backups. Every night, every file is backed up. When I go to the computer at the beginning of the day, the spare drive holds a copy of the filesystem as it appeared when I went to sleep the night before. Now, if I do something really unwise, deleting a pile of important files, or similarly mess up the filesystem, I have a backup that I haven't deleted. If I were to use RAID, deleting a file would delete it immediately from my backup, which isn't what I want. As long as I realize there's a problem before the end of the evening, I can always recover the machine to the way it looked before I started changing things in the morning. If I don't have enough time to verify that the things I've done are OK, I turn off the backup for a night by editing the script.

Another important thing it allows me to do is to test really risky operations. For instance, replacing glibc on a live box can be tricky. In recent years, the process has been improved to the point that it's not really scary to type "make install" on a live system, but ten years ago that would almost certainly have confused the dynamic linker enough that you would be forced to go to rescue floppies. Now, though, I can test it safely. I prepare for the risky operation, and then before doing it, I run the backup script. When that completes, I mount the complete spare filesystem under a mountpoint, /mnt/chroot. I chroot into that directory, and I am now running in the spare. I can try the unsafe operation, installing a new glibc, or a new bash, or something else critical to the operation of the Linux box. If things go badly wrong, I type "exit", and I'm back in the boot drive, with a mounted image of the damage in /mnt/chroot. I can investigate that filesystem, figure out what went wrong and how to fix it, and avoid the problem when the time comes to do the operation "for real". Then, I unmount the partitions under /mnt/chroot and re-run my backup script, and everything on the spare drive is restored. Think of it as a sort of semi-virtual machine for investigating dangerous filesystem operations.

The other thing this gives me is a live filesystem on a spare drive. When my hard drive fails (not "if", "when", your hard drive will fail one day), it's a simple matter of removing the bad hardware from the box, re-jumpering the spare if necessary, and then rebooting the box. I have had my computer up and running again in less than ten minutes, having lost, at most, the things I did earlier in the same day. While you get this benefit with RAID, the other advantages listed above are not easily available with RAID.

Of course, this is fine, but it's not enough for proper safety. The entire computer could catch fire, destroying all of my hard drives at once. I still make periodic backups to writable DVDs. I use afio for my backups, asking it to break the archive into chunks a bit larger than 4 GB, then burn them onto DVDs formatted with the ext2 filesystem (you don't have to use a UDF filesystem on a DVD, ext2 works just fine, and it's certain to be available when you're using any rescue and recovery disk). Once I've written the DVDs, I put them in an envelope, mark it with the date, and give it to relatives to hang onto, as off-site backups.

So, one pair of drives is for my /home partition, one pair for the other partitions on my system. Why do I have 5 drives? Well, the fifth one isn't backed up. It holds large data sets related to my work. These are files I can get back by carrying them home from the office on my laptop, so I don't have a backup for this drive. Occasionally I put things on that drive that I don't want to risk losing, and in that case I have a script that copies the appropriate directories to one of my backed-up partitions, but everything else on that drive is expendable.

There are two problems that can appear with large files.
  • rdist doesn't handle files larger than 2 GB. I looked through the source code to see if I could fix that shortcoming, and got a bit worried about the code. So I'm working on writing my own replacement for rdist with the features I want. In the mean time, I rarely have files that large, and when I do, they don't change often, so I've been copying the files to the backup manually.
  • Sometimes root's shells, even those spawned by cron, have ulimit settings. If you're not careful, you'll find that cron jobs cannot create a file in excess of some maximum size, often 1 GB. This is an inconvenient restriction, and one that I have removed on my system.

Breaking packages

I've used the term "breaking packages" a few times. As I said, I maintain my Linux boxes without a package manager. So, how did I get these Linux boxes?

My main Linux computer has over half a million files in its filesystem, and over 3000 separate executables. Where did they all come from? You need some way to start out, your computer isn't going to do much without a kernel, a shell, and a compiler.

In 1994, I installed Slackware on a 486-based computer. This computer had about 180 MB of hard drive space (nowadays that wouldn't even hold half of the kernel source tree) and 16 MB of RAM. At that time, Slackware didn't really have a package manager. It had packages, just compressed tar files of compiled binaries, grouped by function. If you weren't interested in networking, you didn't download the networking file. If you weren't interested in LaTeX, you didn't download that file. There were only a few dozen "packages", because of this very coarse granularity. The functions like "upgrade", "install", "find package owning file" weren't present. An upgrade was the same as an install, just install the new package into the filesystem, and it would probably replace the old package. To find out which package provided a certain file, you could look in per-package lists of files.

So, I never really had a package manager on that system. When I needed new programs, I downloaded the source code, compiled it, and installed it. When I moved to a new system, I brought backup images or a live hard drive to the new computer. I didn't start with a blank hard drive, I started with the hard drive from the old computer I was replacing. Over the years, I have replaced every executable that was installed in 1994 (I know this is the case because all of the files installed then were a.out format, and I have only ELF binaries on my computer now).

Sometimes, though, I've started with a computer that had a distribution installed on it. At a previous job, my laptop came with Mandrake Linux installed on it. I tried to keep the distribution alive for a while, but eventually got impatient with the package management system and broke the packages.

So, if you give me a new Linux computer and tell me it's mine to modify, a good first step for me is to kill the package manager. On an RPM-based system, that's generally achieved by recursively deleting the directory /var/lib/rpm. After that, the rpm command will stop working, and I have the finer control and more difficult task of managing the box myself.

What do we have running on that box?

As I mentioned in my first post, when you install a distribution you sometimes have programs running without your knowledge. Because some users may need these features, you often get them as well.

Last week at the office, somebody came up to me to ask if I could figure out why it was no longer possible to log into one of the servers. This server has a history of flakiness, there's probably a bad memory module on the board, and sometimes it becomes unresponsive. So, my co-worker, upon realizing that he couldn't log in, had rebooted the computer. However, even after the reboot, he still couldn't log on, either as his regular user through SSH, or as root on the console.

The first step, before getting out of my chair, was to telnet to port 22 on the box. I got a "connected" message, and a text string indicating that I was attached to an SSH daemon. This told me that the kernel was alive, it was accepting new connections and passing them to the appropriate processes, which were themselves able to make forward progress. So, the box wasn't wedged. I went to the console, and tried to log in through the getty running on the text login screen. I entered 'root' at the username, and got a password prompt. When I entered the password and pressed ENTER, the getty process froze, and did not present me with a shell.

So, we have two very different authentication schemes that are failing to allow logins. The console doesn't allow root logins. Something seemed to be interfering with the general activity of authentication. The first thought is that this might be a PAM problem, but it would be a strange one. We didn't get authentication failure messages, we got a hang after authentication. Root's credentials were stored on the local drive, so it wasn't an LDAP issue, and in any case, the machine was on the network, and there weren't LDAP problems anywhere else in the office.

When multiple independent programs fail together, the next thought is that there's probably a full disk somewhere. If you fill up /tmp your system can start to behave very strangely. The login problems were a symptom of something, as yet unknown. So, the next thing to do is to check the hard drive to see if we had any full partitions. Because I didn't know what else might be misbehaving, I wanted to avoid all of the startup jobs, so I rebooted the machine with an appended kernel parameter, "init=/bin/bash". Instead of running the usual /sbin/init, and all of the various scripts under /etc/init.d, the computer would start up the kernel and then drop immediately to a root shell. No logins, no passwords, no startup scripts. I could then run 'df' at the prompt, and confirm that there were no partitions within 5% of full (remember that a default ext2 format will reserve 5% of the blocks for root, so a disk that's 96% full could actually be entirely full for some users). Checking with 'df -i' showed that we had not run out of inodes either.

So, what's next? I decided to reboot the machine into single-user mode so that I could easily modify files on the disk but still get onto the computer without a password. This is done by appending the parameter "S" on the kernel boot line. Again, I get a shell, but this time the disks are read-write mounted, and various services have started up. So, I modify the inittab. I replaced the getty on tty1 with /bin/bash. That means that when the computer is rebooted into multi-user mode, tty1 has a root shell while the other ttys are still running their gettys.

Reboot into the usual multi-user mode. I have a root shell on tty1. I run "ps ax", and find the PID of the getty on tty2. Then, I run the command
strace -f -p <PID>

at the shell prompt of tty1. Changing virtual consoles to tty2 with the usual command, CTRL-ALT-F2, I am presented with a login prompt. I enter the username 'root', and enter the password. The program hangs. So, I change back to tty1 to see what strace has to say about the program. The last things the program did are on the screen. It opened a device called /dev/audit, did some things with it, then issued an ioctl() on the file descriptor. That ioctl was not returning to the caller, so the program was blocking waiting for a response from something associated with /dev/audit.

None of us had heard of /dev/audit, so it was time to do a bit of research. It turned out to be a package that was included in the RHEL distribution installed on that computer. There is communication between the device and a daemon. That daemon keeps logs, so I went to its logging directory to see what was there. I found 4 GB of data there. Apparently that had reached some sort of internal limit, and the daemon responded by forbidding further auditable actions until some of the logs were removed by the administrator. Logins, being auditable actions, were blocked.

So, delete all of the logs in the directory, and reboot the computer. Everything returned to normal.

Now, a logging function like this is very useful for some users. There are some people who must know exactly who logged into the machine, what database entries they accessed or modified, and so on. We are not such people. A service we never knew about, enabled for all because it is useful by some, wound up locking us out of our own machine.

It's a security feature that logins are forbidden until the logs have been inspected and removed. If you're going to design a function like this, then this is the correct way to go about it. Of course, it was very easy for me to overcome this security feature with access to the console, but that's generally true. I probably would have set it up so that gettys are permitted to log in as root even when an audit failure occurs, but that level of flexibility may not be available, if the behaviour is driven by a special PAM module or a patched glibc.

Breaking packages on the MythTV box

As I mentioned earlier, I have a MythTV computer, installed from packages, but I've broken some of the packages. Here are some of the issues that I had with the packages, and how I solved them.

Two of the package manager drawbacks I've mentioned previously appear here: the one-size-fits-all approach to software packaging, and the failure to receive timely updates.

The MythTV box is on old hardware. Because it has hardware assistance for both MPEG encoding and decoding, I didn't need a new computer with a fast CPU. The fact that this is old hardware, with a 7-year old BIOS, may be why I had problems, but I found it easier to break the packages than to try to solve the problems under the constraints of the package system.

First, the MythTV box controls an infra-red LED attached to its serial port, allowing it to change the channels on a digital cable box. This requires the use of the LIRC package, and the lirc_serial kernel module. Well, at the time I set this up, the lirc_serial module was having problems with the SMP kernel. The system would generate an oops quite regularly when it wanted to change channels. Looking at the oops logs, I could see that there were problems specifically with SMP. My MythTV box has only one CPU, so I didn't need an SMP kernel, but because some users will have SMP computers, the KnoppMyth distribution ships with an SMP kernel. I tried to find a non-SMP kernel for the system, without success. So, the easiest way to fix the problem was just to download a recent kernel source tree from, copy the configuration file from the Knoppix kernel, and reconfigure it as non-SMP. The spontaneous reboots stopped occurring. The package manager still believes that it knows what kernel is running on the computer, but that isn't what is really installed.

When I installed the MythTV box, the software was still a bit immature, and a stability fix in the form of version 0.20 came out several months later. I waited a few weeks with no update to the distribution, and no word of when an update might become available. Eventually, I grew impatient and downloaded the source code of 0.20 myself, recompiled it on the MythTV box, and installed it over top of the existing programs.

There was one other impact of the one-size-fits-all approach that caused difficulties with the MythTV box. I was regularly recording a television show between 6:00AM and 6:30AM. A few minutes before the end of the show, the recording would have problems. The audio would break up, and the video would jump. It appeared that the program was losing frames of data, either because it was losing interrupts, or because it couldn't get the data to the disk quickly enough. Because it happened at about the same time every day, I expected it was probably a cron job. I got a root shell on the box, and asked for the list of all root-owned cron jobs with the command "crontab -l". This reported that there were no root-owned cron jobs. I mistrusted this result, and did more investigation. As I mentioned in the first post, distribution packagers often break up a configuration file into a set of separate files. They did that with cron jobs, which means that the command-line tool that ought to tell you all about root-owned cron jobs didn't report the full set of such processes. A bit of digging around in /etc showed that the slocate database update was being run at that time. This process scans the entire disk, making a list of the files on it. While probably useful in a general context, this is an unnecessary operation on an appliance box that isn't changing, particularly when it results in so much bus traffic that the primary function of the box is degraded. My solution was to change the /etc/crontab file (which is, itself, not viewed by "crontab -l") so that a cron job would be skipped if there were any users (reported by the 'fuser' command) of either of the two video input devices, /dev/video0 and /dev/video1.