Regular Expressions, Text Processing, and Web Scraping

A workshop by Jadrian Miles, from the Brown University Computer Science Department

Part of THATcamp New England 2012 at Brown University, Providence, RI

Though the workshop is now in the past, you can download an annotated transcript of all the code typed into the command line during the workshop. With this transcript, the example code (zombify.py and pitchfork_scraper.py), and the Pride and Prejudice plaintext document, you should be able to recreate the magical experience of this workshop in the comfort of your own home (or a friend's!).

To assist you in your own exploration of Python, regular expressions, and so forth, here are some resources:

Data sources:

Useful links:

Somewhat less useful links:

Please also see the instructions for preparing for the workshop if you don't already have Python, a command-line terminal, and a good plaintext editor working on your computer.