Monday, April 8, 2013

XChat Script for Fetching URL Titles

If you are regularly using IRC, you definitely appreciate having any idea about what's behind a posted URL before clicking on it. Prime examples for that are YouTube links, and Short URLs. This XChat script fetches and prints the titles of posted websites, is as efficient as possible, only fetching as much from a relevant URL as necessary for determining a possible website title, and can be used on any platform where XChat or HexChat run as well.

Due to that technique, it's also proof against URLs of music and video streams, really large websites, and similar, which would otherwise make the script hang and, in turn, XChat. In fact, it did exactly that before I only most recently got incited enough to completely overhaul the way it works, and hence, make it eventually safe for a wider use.

Additionally to that, the script notifies about any found image, audio, and document URLs, as well as any other possible URIs, and reprints any found URIs that would otherwise be non-clickable because they have some punctuation characters directly attached to them, plus bare "www.[...]" ones.


The script is written in Perl, so it obviously needs XChat's Perl plugin to be installed and loaded, which it should be by default on any supported platform. Also, it requires the Perl module "LWP::UserAgent", so make sure the package "libwww-perl" or "perl-libwww-perl", respectively, is installed on Linux. For example, on Debian-based distros, like Ubuntu, run in a Terminal:

sudo apt-get install libwww-perl

And on RPM-based distros, like Fedora, run:

sudo yum install perl-libwww-perl

Or on openSUSE / SUSE Linux Enterprise:

sudo zypper install perl-libwww-perl

If it's not installed by default with Perl on Windows or Mac OS, or if you don't use the package system on Linux, you'll need to figure it out yourself, though.

Now, to install the script itself on Linux, you can either download it manually into the top level of the directory "~/.xchat2" in your home directory (hidden, so pressing Ctrl+H may be needed), or just run in a Terminal:

wget -t 5 -T 10 -O ~/.xchat2/

Manual download:

On Windows or Mac OS, similarly, just download the script into the top level of your XChat profile directory.

By placing the script in the top level of your XChat profile directory, it will be loaded automatically on XChat's startup, alternatively you can load it manually from any other directory via "Window > Plugins and Scripts", though.

Notice, you can happily ignore the 'crunchbang' with Perl's path at the very top of the script, since it's run by XChat and that knows how to invoke Perl, as long as its above mentioned Perl plugin is installed and loaded, of course.


User Agent

You should make sure that the script identifies itself as the web browser you are usually using, i.e. uses its User Agent:

$uagent->agent('Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:22.0) Gecko/20100101 Firefox/22.0');

Your current User Agent is:

Character Limit

I have the script currently set to fetch only the first 12,100 characters from a relevant URL, usually equating to 12,288 bytes of fetched data, but the latter can increase substantially the more non-single-byte characters are used, most commonly Unicode ones (UTF-8, UTF-16, etc).

If you find yourself being posted a lot of URLs to websites burying their HTML titles even deeper in the source code of their pages, you may want to increase that limit, it should be sufficient to cover the most, at least somewhat properly set up sites, though. Notice, as indicated there too, that it will eventually result in steps of 4096 bytes each.

die if length($content) >= 12100;   # character limit for pulling in chunks of 4096 bytes each

Related Posts:

IRC , Usability