CeWL is a custom wordlist generator written by Robin Wood. Written in Ruby, CeWL takes a target website as an argument and crawls the site for HTML, MS Office (2007 and earlier) and PDF documents. For each supported document, CeWL extracts the words, email addresses and metadata to build a wordlist.
Used with tools such as Asleap and coWPAtty, CeWL’s wordlist generation technique can be very useful, building a dictionary off words found on the target website. This often includes project names, acronyms and other content that apply specifically to the target and may be successful in a dictionary attack where standard dictionary words would not.
While I’m working on another project, I’ve departed from Gentoo to run Ubuntu 9.10. I’m looking forward to the day I can return to Gentoo, but until then, I got CeWL to run on Ubuntu without much complication:
$ sudo apt-get install exif libimage-exiftool-perl
$ sudo gem install http_configuration spider mime-types mini_exiftool rubyzip spider
$ echo "export RUBYOPT=\"rubygems\"" >>~/.bashrc
$ source ~/.bashrc
$ wget http://www.digininja.org/files/cewl_2.2.tar.bz2
$ tar xvfj cewl_2.2.tar.bz2
$ cd cewl
$ ./cewl.rb --help
cewl 2.0 Robin Wood (dninja@gmail.com) (www.digininja.org)
Usage: cewl [OPTION] ... URL
--help, -h: show help
--depth x, -d x: depth to spider to, default 2
--min_word_length, -m: minimum word length, default 3
--offsite, -o: let the spider visit other sites
--write, -w file: write the output to the file
--ua, -u user-agent: useragent to send
--no-words, -n: don't output the wordlist
--meta, -a: include meta data
--email, -e: include email addresses
--meta-temp-dir directory: the temporary directory used by exiftool when parsing files, default /tmp
-v: verbose
URL: The site to spider.
CeWL is one of the tools we cover in my Ethical Hacking Wireless course, running next in New Orleans on January 11-16. It’s not too late to sign up for this class, and escape the winter chill for good food and wireless hacking in New Orleans.
-Josh