web-git-sum

14 Feb 2019
Progress: Complete

web-git-sum is a script that generates a static summary page for git repositories.

About

Git is not Github, git is the version control software and github, along with similar services, is a company that hosts git repositories. Hosting remote git repos on your own server has a number of benefits, beyond simple paranoia, even if those third-party hosts offer free private repositories now.

I have become so accustomed to git the last few years that I use it almost out of habit, even for small personal projects. For backing up, you can add anything as a remote, even another location on your harddrive. Since I have SSH access to this webserver, I sometimes use it as a remote for these personal projects. All it takes is to run git init --bare, add that folder as a remote, and you can push and pull from it at will.

These are normally half-finished or secret projects so I stick them in a private folder, but sometimes I want to share them. If you simply stick the bare git repo in a web-public directory, that's almost, but not quite enough to be able to clone from that URL. There are two protocols for serving git repos over HTTP, the "dumb" protocol and the newer "smart" protocol.

For the dumb protocol, all that's needed is to generate a couple of static files which give a few hints to the client on what objects to retrieve. The command to do this is git update-server-info. This needs to be executed every time the repo is changed, but that's easy because git provides a number of hooks for different events. If you look in the hooks directory of a git repo, the example hook for post-update is to update the server info.

The smart protocol loads git-http-backend as a CGI script and lets that handle everything, no post-update hook required. The protocol is a lot more efficient than the dumb one, and it also effortlessly lets us push via HTTP too, if we enable it. (I think it is also possible to push to the dumb protocol via WebDAV if your server supports that.)

The main advantage of using a service like Github is the web frontend they provide, being able to quickly see the state of the project, nice visuals of branching and merging, and easily viewing commits. Obviously there are also pull requests and issue tracking but those aren't so important when you only have one or two people working on something. There are of course tools for visualizing a local repo, both graphical frontends for git and also terminal tools like tig, but these don't do quite the same thing, since the state of your local repo may not match the remote (and you may not want it to).

Gitlab is both a service and a piece of open-source software. It is billed as a lightweight web frontend for git, but evidently some people have a different idea of what "lightweight" means. I tried it out, and it eats up many gigabytes of RAM and lots of CPU time even when there isn't a single request. It probably scales well, when you have thousands of users accessing it at once, but for my needs it's incomprehensibly bloated.

What I've got is a small number of repos (<1000), each with a small number of commits (<1000), a small number of authors and importantly the changes are probably a lot less frequent than the pageviews for the web frontend. There are smaller, simpler alternatives to gitlab, such as gitweb and cgit, there are even PHP frontends which have the benefit of being very easy to set up. But I was particularly taken with stagit, a static git page generator. As you can imagine, it generates a bunch of static html files describing the repo, which are then re-generated by the post-receive hook.

I really like stagit, but it's a compromise, because the functionality is never going to match that of a dynamic frontend. A page is generated for every file and for every commit on the main branch, which can mean a lot of files, but not enough to do arbitrary diffs. It also can't preview other branches, which is something I'd have liked. For at least one of my repos on github, I have an alternate branch, with a different readme file, that can be previewed by using the drop-down box. I suppose the best frontend would be a statically generated base, with client-side javascript routines filling in the functionality for less common operations. That probably exists, but I haven't looked for it.

I played around with the source code for stagit for a while, and eventually realized that all I really needed was the following: a summary page listing the latest commits, the readme file, the file tree and a list of branches and tags. Hence, web-git-sum. It's a single bash script, runs on the post-receive hook, and generates just two files, a summary page and a repo index.

Installation

Refer to the git documentation on installing the http-backend if you plan on using it. My server is running apache so I'll describe the configuration for that, it shouldn't be too different for other servers.

Initialize or clone the repos you want with either --bare or --mirror flags. It is customary to name the bare repo directory with a .git suffix.

Edit the file description in each repo if you want that to be displayed.

Either copy or symlink the shell script into the hooks directory of each repo and name it post-update. If you want to invoke it manually, make sure you run it from the git directory, i.e. by calling hooks/post-update as this is the environment the hook runs in.

The first few lines of the script are configuration. If you want readme files to be parsed as markdown, you will need a markdown-parsing binary to be installed somewhere. I chose to use md4c's md2html utility. It is fast and lets me specify the flag for auto-links, to mimic the behaviour of github-flavoured-markdown. If you don't want markdown, you could just specify cat as the binary and surround it in pre tags.

I chose to create the summary page as an index.htm file within the git directory, and rewrite the URL using .htaccess directives. This is a little more complex than it needs to be, but it means that the folder structure is more elegant. The desired behaviour is that accessing the clone URL in a browser redirects to the pretty URL (without the .git) and displays the summary page, which is in reality stored in the git dir too.

RedirectMatch 302 ^/(.*)\.git/?$ /$1

RewriteCond %{REQUEST_FILENAME}.git/index.htm -f
RewriteRule (.*) $1.git/index.htm [END]

END is important so that it doesn't fight the http-backend CGI configuration. The regex for invoking the CGI script matches (\.git/.) — the final dot making sure it doesn't fight my first RedirectMatch.

A simpler set-up is to generate the summary page in the directory above, by changing the last part of the main block (just prior to generating the repo index) to be

} > "../${name}.htm"

Then you can either make the repo index point to those htm files directly, or rewrite them to not need the .htm suffix with something like this in your .htaccess:

RewriteCond %{REQUEST_FILENAME}.htm -f
RewriteRule (.*) $1\.htm [L]

In terms of permissions, most of the repo structure will need to be writeable by the apache user if you plan to support pushing anyway, so I recursively set the group for each repo folder to the apache user using chgrp -R. At the bare minimum, the html files will need to be writeable, so you could potentially set the permissions on just these files after running the script manually.

Once you have everything set up, you can make a script for adding a new repo. Mine looks something like this, although there's a bit more so I can support subdirectories with different permissions. If you're only using git for one thing on that server, you could potentially modify the default template for new repositories so everything is set up automatically.

#!/bin/bash -e

if [[ $# -eq 0 ]] ; then
        echo "Usage: $0 reponame"
        exit
fi

name="$1"
if [[ ! $name =~ (\.git$) ]]; then name+=".git"; fi

if [[ -d "$name" ]]; then
        echo "Error: $name already exists"
        exit
fi

mkdir "$name"
cd "$name"
git init --bare
git config http.receivepack true
ln -s ../../web-git-sum.sh hooks/post-update
chgrp -R apache .

read -r -p "Enter description? (y/N) " response
if [[ "$response" =~ ^([yY][eE][sS]|[yY])+$ ]]; then
        read -r -p "Description: " desc
        echo "$desc" > description
fi

echo "Success"

You can download or clone the script from git.mitxela.com or from github. I will probably continue to stick things on github and have git.mitxela.com be a mirror.