I had to get some statistics about file sizes today, but couldn’t really find a tool for the job, so naturally, I wrote one.
import os, sys, re
from os.path import join, getsize, exists
def median(numbers):
s = sorted(numbers)
l = len(numbers)
if l % 2 == 0:
a, b = s[l / 2 - 1 : l / 2 + 1]
if a != b:
return a + b / 2.0
else:
return a
else:
return s[l / 2]
sizes = []
req_re = None
target = '.'
if len(sys.argv) > 1:
target = sys.argv[1]
if len(sys.argv) == 3:
req_re = re.compile(sys.argv[2])
for root, dirs, files in os.walk(target):
for name in files:
absp = join(root, name)
if exists(absp):
if not req_re or req_re.search(absp):
sizes.append(getsize(absp))
num = len(sizes)
total = sum(sizes)
print "Num files: %d" % num
print "Average : %0.2f KB" % ((total / num) / 1024.0)
print "Median : %0.2f KB" % (median(sizes) / 1024.0)
print "Min : %0.2f KB" % (min(sizes) / 1024.0)
print "Max : %0.2f KB" % (max(sizes) / 1024.0)
Usage should be self-explanatory.
Posted by Mads Sülau Jørgensen at 4:00 pm on February 4th, 2010.
Categories: Uncategorized. Tags: Python, Work.
I keep doing the same ipfw commands over and over. Enough of that, here is my first applescript application every. Probably filled with bugs and other scary things, and I’m probably not the first one to do this, but I think I’m the first to stick the source out there.
property FLUSH_TEXT : "Quit and flush"
property SET_TEXT : "Set speed"
-- be damn carefull what you input here, it will run as root
on ipfwLimit(bandwidth)
my ipfwFlush()
do shell script "ipfw pipe 1 config bw " & bandwidth & "KB" with administrator privileges
do shell script "ipfw add 10 pipe 1 tcp from any 80 to me" with administrator privileges
do shell script "ipfw add 11 pipe 1 tcp from me to any 80" with administrator privileges
end ipfwLimit
-- flush any ipfw rules
on ipfwFlush()
do shell script "ipfw -f flush" with administrator privileges
end ipfwFlush
on main()
set question to display dialog "Control your http traffic speed" buttons {FLUSH_TEXT, SET_TEXT} default button 2
set answer to button returned of question
if answer is equal to FLUSH_TEXT then
my ipfwFlush()
end if
if answer is equal to SET_TEXT then
set bandwidth_question to display dialog "Enter bandwidth in KB/s (don't do something stupid like entering \"; rm -rf /)" default answer "56"
set bandwidth to text returned of bandwidth_question
my ipfwLimit(bandwidth)
my main()
end if
end main
my main()
Posted by Mads Sülau Jørgensen at 12:40 pm on September 15th, 2009.
Categories: Uncategorized.
I’ve recently installed Apple’s new 64 bit OS Snow Leopard, on my work computer. I use postgresql extensivly together with python, and usually use apple’s bundled python2.5 for working with django.
As the daredevil I am, I wanted to recompile all my macports to use the new 64 bit system, and therefore deleted them all, and made a fresh install of macports. After building the postgresql81 port, I was about to build the psycopg2 python postgresql driver for python 2.5, when it gave me a warning about not being able to find some symbols in the postgresql library it had linked to. I quickly realized that this might be an architecture problem, and sure enough, it turns out that python 2.5 is a i386/ppc and python 2.6 is x86_64/i386/ppc binary, as can be seen here:
$ file `which python`
/usr/bin/python: Mach-O universal binary with 3 architectures
/usr/bin/python (for architecture x86_64): Mach-O 64-bit executable x86_64
/usr/bin/python (for architecture i386): Mach-O executable i386
/usr/bin/python (for architecture ppc7400): Mach-O executable ppc
$ file `which python2.5`
/usr/bin/python2.5: Mach-O universal binary with 2 architectures
/usr/bin/python2.5 (for architecture i386): Mach-O executable i386
/usr/bin/python2.5 (for architecture ppc7400): Mach-O executable ppc
The solution seemed so simple. Recompile postgresql81 for both architectures, and let the linker figure out the rest.
Building the postgresql81 port as the +universal variant, does not work. It has something to do with the fact, that the linker (ld) does not know how to produce a binary for multiple architectures. After a good nights sleep, the solution was only a trac ticket away.
So, to build a i386 and x86_64 version of postgresql8x via macports, you have to patch the Portfile, which is located in /opt/local/var/macports/sources/rsync.macports.org/release/ports/databases/postgresql81.
That can be done like this – notice that the patch seem to place the files wrong, so we’re moving them as well:
$ cd /opt/local/var/macports/sources/rsync.macports.org/release/ports/databases/postgresql81
$ curl -s http://trac.macports.org/raw-attachment/ticket/14619/combined_updated_universal.patch | sudo patch
$ sudo mkdir files/
$ sudo mv ld.sh files/
$ sudo mv patch_pg_config_h files/
Now you can go ahead and build the postgresql81 port with both architectures, like so:
$ sudo port install postgresql81 +universal
And then, finally, we can build the psycopg2 extension for python:
$ wget http://initd.org/pub/software/psycopg/psycopg2-2.0.12.tar.gz
$ tar zxf psycopg2-2.0.12.tar.gz
$ cd psycopg2-2.0.12
$ sudo python2.5 setup.py install
$ sudo python2.5 setup.py clean
$ sudo python2.6 setup.py install
And you’re off.
Posted by Mads Sülau Jørgensen at 11:42 am on September 3rd, 2009.
Categories: Uncategorized.
I’m running multiple different project on AWS which was so much of a pain to use, as I often find myself having to use the identity of project-a together with the official amazon ec2 tools.
To help myself manage the multiple identities, I wote a set of bash functions, called:
aws_load <config-name> – loads configuration from config-name
ec2ssh <instance-number-in-ec2din-list> – ssh’s into a given instance, with the root key
ec2scp – a shorthand for scp -i <keyfile>
I keep the configuration files in the directory ~/amazon/conf/name.sh and keypairs in ~/amazon/keypairs/ but that should be obvious to change.
To change or load an identity, one simply calls the function from a shell prompt like so:
mads@workmads ~ % aws_load some-identity
loaded certificate ...
loaded /Users/mads/amazon/conf/some-identity.sh (...)
I hope someone finds this as useful as I do.
Functions (could be placed in .bashrc or .zshrc).
function aws_load {
if [ -n "$1" ]; then
ec2_configurations="$HOME/amazon/conf"
ec2_keys="$HOME/amazon/keypairs"
conf="$ec2_configurations/$1.sh"
if [ -x "$conf" ]; then
unset AMAZON_ID AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_CERT EC2_PRIVATE_KEY EC2_CERT AWS_KEYPAIR_NAME
source $conf
if [ -n "$AWS_KEYPAIR_NAME" ]; then
export AWS_SSH_KEY="$ec2_keys/id_rsa_${AWS_KEYPAIR_NAME}-keypair"
fi
if [ -n "$AWS_CERT" ]; then
export EC2_PRIVATE_KEY=~/.ec2/pk-$AWS_CERT.pem
export EC2_CERT=~/.ec2/cert-$AWS_CERT.pem
echo "loaded certificate $AWS_CERT"
fi
echo "loaded $conf ($AMAZON_ID)"
else
echo "configuration $conf not found (or not executable)"
fi
else
echo "usage: aws_load <configuration name>"
fi
}
function ec2ssh {
if [ -n "$1" ]; then
HOST="`ec2din | awk '/i-/ {print $4}' | tail +$1 | head -n 1`"
ssh -i $AWS_SSH_KEY -l root ${HOST}
else
echo "Please write a number"
fi
}
function ec2scp {
scp -i $AWS_SSH_KEY $@
}
Configuration “file” template to be placed in ~/amazon/conf/<config-name>.sh:
#!/bin/sh
export AMAZON_ID=""
export AWS_ACCESS_KEY_ID=""
export AWS_SECRET_ACCESS_KEY=""
export AWS_CERT=""
export AWS_KEYPAIR_NAME=""
Happy identity switching.
Posted by Mads Sülau Jørgensen at 4:50 pm on July 22nd, 2009.
Categories: Python, Work.
Today, I had to copy 70 GiB of data from a ext3 filesystem to a XFS filesystem. This involved a lot of small files. After a couple of hours of waiting, I thought it’d be best to just leave it running, and resume my activities the day after. But oh nooo, I forgot to run it in a screen. More… »
Posted by Mads Sülau Jørgensen at 2:31 pm on July 16th, 2009.
Categories: Work.
Recently (10-20 minutes ago), amazon couldfront (a cdn) stopped sending dns replies in europe:
% dig -t ns cloudfront.net
; <<>> DiG 9.4.3-P1 <<>> -t ns cloudfront.net
;; global options: printcmd
;; connection timed out; no servers could be reached
I was going to do a guide to set up a varnish to replace cloudfront temporarily (and did actually set up the instance, and software – I might do the guide and ami anyway) when I realized, that I (as well as most other people) can just change the relevant url to point to the S3 bucket. Problem solved. That will, however, not be as fast as either cloudfront itself, or a varnish cached backend.
Should anyone be interested in how varnish is setup to handle failures from cloudfront, I’ll happily do an ami.
Posted by Mads Sülau Jørgensen at 10:51 am on June 30th, 2009.
Categories: Work. Tags: amazon aws, cache, cloudfront, varnish.
Until recently I’ve been using the file:// django cache, but that has a “problem” when multiple users needs to manipulate the cache (think uid 80 writes a key, that uid 1000 wants to delete).
My problem with the memcached:// django cache provider has been, that it cannot handle being used on a shared memcached instance, because of the danger of key collissions.
More… »
Posted by Mads Sülau Jørgensen at 1:23 pm on June 23rd, 2009.
Categories: Django, Programming, Python, Work. Tags: cache, Django, memcached, prefix.
I’ve got nothing more to say than:
mads@workmads ~ % cat .ssh/config
ServerAliveInterval 60
Happy ssh’ing.
Posted by Mads Sülau Jørgensen at 7:24 pm on June 12th, 2009.
Categories: Uncategorized. Tags: EC2, SSH, Work.
Seeing as there is no really easy way to do a HTTP HEAD request from python, I wrote up the following small method:
In advance I’d like to apologize for the method that assemblies the request path.
Update: Added handling of redirects.
def http_head(url):
import httplib
import urlparse
redirects = 0
while redirects < 10:
scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
if scheme == 'https':
conn = httplib.HTTPSConnection(netloc)
else:
conn = httplib.HTTPConnection(netloc)
conn.request("HEAD", "%s%s%s%s%s" % (path, query and "?" or "", query,
fragment and "#" or "", fragment))
res = conn.getresponse()
if res.status in (301, 302) and res.getheader('location'):
url = res.getheader('location')
redirects += 1
else:
break
return res.status, res.reason
Posted by Mads Sülau Jørgensen at 12:30 pm on May 15th, 2009.
Categories: Uncategorized. Tags: HEAD, HTTP, Python, Work.
I keep forgetting how to format and indent xml from the command line. The tool xmllint does a fine job of doing just that, which has saved me numerous times whilst working with sports results. So. Much. Data.
Running
will re-format and re-indent the xml in the input file, and check it for various errors while doing it.
Posted by Mads Sülau Jørgensen at 3:11 pm on May 13th, 2009.
Categories: Uncategorized. Tags: Work, XML.