Project 2.3: Basic Webapp : CS 257

Learn about the technical infrastructure of a web application, so that you can better plan your project.
Set up skeleton code that implements a basic information flow: user → browser → web server → web application, and back.
Create a clean interface between the web application and your eventual data source. (In the end, the web application will talk to the database before responding back through the chain of command to the user.)

This page looks really huge, but the steps are mostly pretty straightforward. Please read through all the instructions before you dive in, so you have a sense of where you're headed.

Step 0: Set Up the Server

Follow the instructions in the Server Lab to get yourself set up with a clone of your Bitbucket repository right in the proper directory on our server, Thacker.

Step 1: Set Up the Skeleton

Before you start, try out my starter application from the user perspective. Visit tinywebapp.html and try entering stuff in the form. Notice that when you submit the form, the values you entered show up in the URL. That's because the values get submitted to the processing script (tinywebapp.py) by GET rather than POST.

If you directly visit tinywebapp.html in your browser, then the server sees that you're asked for an HTML page and sends it back to you, and then your browser renders it. If, on the other hand, you visit tinywebapp.py, you're asking the server to run the script, and it will return the output to your browser. (As it happens, tinywebapp.py crashes when provided no input, so we get an error when we try to run it. That's no good!)

I have a special script that's just made for displaying the source code of other files that sit on the server. By using this script, I can let you see the source code of the two files that make up my application: tinywebapp.html and tinywebapp.py.
Create copies of tinywebapp.html and tinywebapp.py in your local Git repository. You can either right-click and save the source-viewing links above (you'll have to correct the names, though), or you can copy-and-paste the source code into your text editor.
In order for the web server to run tinywebapp.py properly, it has to be marked as executable. On the command line (on your local machine), navigate to your repository directory and then do the following:
```
ls -l tinywebapp.py
chmod a+x tinywebapp.py
ls -l tinywebapp.py
```
The leftmost column of information when you run ls -l tells you the permissions on the file. Notice what changed: tinywebapp.py is now marked as executable (“x”) for all user categories.
Now that you have both tinywebapp.html and tinywebapp.py in your repository, and the permissions are set correctly for the Python script, add these files to source control, commit, and push.
SSH into Thacker (in a separate Terminal window or tab), navigate to your webapp directory (it should be as easy as typing cd 257_web), and then pull.
Load up http://thacker.mathcs.carleton.edu/cs257/CARL_USER/tinywebapp.html in your browser. Hooray! You've got a webapp!

Interlude: File Manipulation Through Git

Source-control tools like Git are experts at remembering the past. This is a great advantage sometimes, but other times it can be obnoxious. What if you want to delete a file from your webapp directory? If you don't do it properly, then every new pull from the origin repository will recreate that old file.

The trick is to let Git know about the changes you want to make. Every change is considered a modification to the working copy, which can be added (committed (recorded in the local repository) and pushed (recorded in the remote, origin repository). So here are a couple handy tools.

To delete a file called foo.py that's currently in your repository:
```
rm foo.py
git add foo.py
git commit
git push
```
Here the add command is letting Git know that you want it to pay attention to the fact that you deleted the file; you're planning on committing that change. (Sometimes you delete a file just so that you can re-pull the origin copy of it, for example; here we're letting Git know that we really mean to remove the file from the repository.)

There's a shorthand that you can use if you remember:
```
git rm foo.py
git commit
git push
```
git rm is just a shorthand for a regular rm followed by a git add of that change.
To rename a file called foo.py to instead be bar.py:
```
git mv foo.py bar.py
git commit
git push
```
Here it's best to directly let Git know that you want to rename the file; this way, it will represent foo.py's history as continuing onward under the new name bar.py.

If, on the other hand, you did a manual mv foo.py bar.py, then it would look to Git like you deleted foo.py and created a totally new file called bar.py (whose contents just happen, by incredible coincidence, to be identical to what foo.py used to have in it!). If you commit that change, then the revision history of foo.py will stop here, and bar.py will appear as a whole new file.

In both cases, you're only deleting or renaming the file in this commit and all those going forward. You can still totally check out older revisions of your repository, and the file will show up correctly as it did at the time that revision was committed.

Step 2: Modify the Webapp

tinywebapp is okay, but ultimately we'd like all parts of the application to be generated by scripts. Here's my second-generation example: webapp.py. Inspect the source code of webapp.py to see how I pull that off. I've separated the front-end presentation into a file called template.html, so that it's mostly just processing logic in webapp.py.

Make sure you read through the complete source code of webapp.py, especially the sanitizeUserInput() and printMainPageAsHtml() functions.

Now, based on what you learn there:

Modify your web app so that it works like mine: have both the front page and the response page generated by a Python script called webapp.py. Make sure that no old files are hanging around; you'll want to delete or rename tinywebapp.html and tinywebapp.py.

To be explicit: your webapp should work correctly when accessed by the URL
```
http://thacker.mathcs.carleton.edu/cs257/CARL_USER/webapp.py
```
Finally, add whatever HTML input elements you need to allow people to request the simplest service your application (the real one you're planning to build for Project 2) is intended to provide. Maybe you need radio buttons or checkboxes or dropdown lists or whatever seems suitable. Here's a tutorial on making forms, and to dive into the nitty-gritty you can check out the Mozilla Developer Network's detailed guide.

The requirements for this part are as follows:

Good code style (including naming, commenting, indentation, camelCaseForFunctions, and variables_with_underscores).
Good software design (modularization, etc.).
Sanitization of CGI inputs.
Using the same script to present the form and also display the results.
Providing some very basic front-end feature of your planned app.

Interlude 2: Remote Editing

When you're debugging a web application, it can get really burdensome to edit your file, commit the change, push it, pull on the server, and then refresh your browser.

Three alternatives you might want to explore:

Editing directly on the server. You can use a command-line editor like nano, vim, or emacs to do this. Personally, I can't stand command-line editors for real editing jobs, but some people swear by them, and there's no denying their usefulness. If you find yourself making large changes, I'd suggest trying one of the other options.
Treating your server's repo as origin. Remember, the origin repository is just whatever you cloned from; there's nothing special about Bitbucket's servers except that they're carefully maintained and accessible from anywhere. You could instead clone from the repo that's on Thacker:
```
git clone thacker.mathcs.carleton.edu:257_web/ .
```
(If you're in the CMC, you can probably get away with just calling it thacker, rather than using the full name.)

Now, whenever you push, your changes will appear immediately on Thacker. Just remember to eventually log into Thacker and push your changes to Bitbucket too!
Transferring files directly. Rather than using Git (with which you have to commit every change, with a comment, and then push), sometimes it's easier to just edit a file and then “push it” over to Thacker with less ceremony. The command scp is perfect for exactly this; the name stands for “secure copy”. Say I've edited foo.html, and now I want to copy my new version over to Thacker. It goes like this:
```
scp foo.html thacker.mathcs.carleton.edu:257_web/
```
(As above, if you're in the CMC, you can probably get away with just calling it thacker, rather than using the full name.)

Just be careful that you don't get too mixed up between source control and SCP! Use SCP to copy files over while you're debugging them, and then once you're certain you've got the right thing, commit it, push it, and then pull it down onto Thacker.

Interlude 3: Debugging

When your script crashes for any reason, by default the server just returns an HTTP 500 error, with no feedback about what went wrong. There are a few things you can do to make debugging less painful:

Try to run your script manually first. You can always invoke your script directly from the command line, right on your local machine. If your webapp is called foo.py, then you should be able to run it just by doing the following on the command line:
```
./foo.py
```
That's pretty much exactly what the server does when a user requests the script's URL in their browser. You may get CGI errors (since you don't have a way to provide CGI input when you invoke the script from the command line), but this will warn you about many problems. In particular, if your script has a parse error in it, then cgitb can't help you, because your Python code will never be run at all. Invoking your script directly from the command line, however, will tell you about parse errors.

Consider the possibility, also, that the copy of the script on the server may be different from the one you have locally. If things are pretty mysterious, try invoking the script from the command line on the server too.
Design your code for testing. Two of the primary things that help testability are code modularity and sealing off external dependencies. The next section talks about how to seal off dependencies on the database, but you can also do the same for CGI calls. If all your CGI calls are tucked away in small, specialized functions, you could easily code in alternate execution paths that use “mock” CGI interfaces or something else. Then you could import your program's source code into a test program, or run it from the command line with special parameters that change its behavior.
Use cgitb. The cgitb Python module will write most error messages into the output stream that gets returned to the client. You should only use this while you're debugging; technically it's a minor security risk and should be disabled in any “production code”. It's super easy to use; just put these lines near the top of your script:
```
import cgitb
cgitb.enable()
```

Step 3: The Database Interface

Your web application, whose entry point is webapp.py, will need to access your data to provide the desired services for people. Next week, we will get your PostgreSQL databases set up with your data. Regardless of the PostgreSQL details, in the meantime you're going to create a class to act as an interface between your main application code and the data. This is an important idea, so read carefully.

Suppose, for example, your dataset is federal election campaign finance data. Then you would create a file called data_source.py, containing a class more or less like this:

class DataSource:
    def __init__(self):
        """Constructor for the DataSource database interface class.
        """
        # You'll have to decide what kinds of arguments the constructor needs,
        # what data members it should set up, etc.

    def getCandidateList(self):
        """Returns a list of the id numbers of all candidates in the
        campaign finance database."""
        # Implementation will eventually go here. In the
        # meantime, just return an object of the right type.
        return []
    
    def getContributionsForCandidate(self, candidate_id):
        """What would the return type be here? A list of...what?
        """
        return []
    
    # ... and so forth.

The idea is that your application needs to ask certain idiosyncratic questions of your data. This class's methods will be designed to provide answers to those questions, regardless of where and how the data are stored. You could, for example, write one version of DataSource that assumes the data are stored in a CSV file, and another version that assumes the data are stored in a PostgreSQL database. By isolating the main application from the details of the data source, you get all sorts of great benefits, which you can undoubtedly imagine (and which we will discuss in detail in class during the next few days).

Please think carefully about what methods this class should have, and what their signatures should be (i.e. names, parameter lists, return values, and externally observable behavior). Document these methods in detail in a docstring below the def line before trying to implement them. Your documentation should succinctly explain the meaning and type of each parameter, the operation of the function, and the meaning and type of the return value. If the method might raise an exception, it's good to document the conditions under which that might happen as well.

As you can see in my example, I have prepared my DataSource class's methods as stubs. That is, they return empty lists or zero or whatever type of “nothing” is appropriate for the method in question. Alternatively, I could return dummy data by just hard-coding it in each stub's return statement.

Turn It In

Make sure your code doesn't have any absolute paths in it, and doesn't require any special uses of your username; the graders and I are going to clone your repository at this tag into a directory of our own to test it, so make sure that it'll work even if it's not in your own directory. (One way to test this is to have a partner check it out into their directory on Thacker. It should run exactly the same, with no modifications. If not, fix that up before you tag.)

Once you've verified that your app is ready for submission, commit and push all your changes to Bitbucket, and then tag your commit with phase_2_3.

Project 2.3: Basic Webapp

Goals