Recently i had a bizarre situation where i had been running a load of stuff through Hadoop on one of our EC2 clusters, and the job had failed, but it failed in such a way that if i could save away the data, (and the logs for diagnostic purposes), and restart the cluster with some changed parameters i would be able to recover the data and carry on, having invested about $60 in machine time and hours of head scratching already on the failed run, I thought this was a good idea.
So no problem just use hadoop's distcp to move the data up to s3n://bucketname, hmmm that did not work out: OK, how about hadoop dfs -cp to a local directory of all the data and use s3cmd to move it to s3 storage, hmm that did not work out either, something odd is going on here.
Then it dawned on me that some of the files in the run where very large, as where the intermediate products, around 10G on average and there where lots of them. Now s3 has a limit of 2G for objects stored in its file system. I did not want to use the hadoop s3:// non native filesystem as that is difficult to verify if all the data had arrived safely, as nothing else would read it, so i had a brainwave, and successively mounted and unmounted an EBS volume onto each machine , saved away the relevant bits and pieces, and then moved onto the next box. All of our base EC2 images have xfs support wired into the system, as well as /mnt/ebs as a mountpoint for an attached ebs volume. So it was simply a case of using ElasticFox to attach the EBS volume to the instance and the issuing a "mount/mnt/ebs" and then "umount /mnt/ebs" once i was done.
So effectively using an EBS volume as a virtual flash-drive.
Tuesday, 10 March 2009
Subscribe to:
Posts (Atom)