Writing a small static site generator

There are like, a hundred different static site generators written in Python (and even more written in other languages).

So I decided to write my own. Why? Well, I just kind of wanted to. I had a desire to move my blog away from Ghost and I wanted to keep things really minimalistic. I decided to use GitHub Pages to host the output as they recently announced support for SSL for custom domains.

Rendering content

Every static site generator needs to take some source format (like Markdown or ReStructuredText) and turn it into HTML. Since I was moving from Ghost I decided to stick with Markdown.

Since I recently integrated Github-flavored Markdown rendering into Warehouse, I decided to use the underlying library I made for that - cmarkgfm. Rendering Markdown to HTML with cmarkgfm looks something like this:

import cmarkgfm


def render_markdown(content: str) -> str:
    content = cmarkgfm.markdown_to_html_with_extensions(
        content,
        extensions=['table', 'autolink', 'strikethrough'])
    return content

cmarkgfm does have convenience method called github_flavored_markdown_to_html, but it uses GitHub's tagfilter extension which isn't desirable when I want to embed scripts and stuff into posts. So I just hand-picked the extensions I wanted to use.

Collecting sources

Okay, we have a way to render Markdown but we also need a way to collect all of our source files. I decided to store all of sources under ./src. We can use pathlib to collect them all:

import pathlib
from typing import Iterator


def get_sources() -> Iterator[pathlib.Path]:
    return pathlib.Path('.').glob('srcs/*.md')

Frontmatter

Many static site generators have a concept of frontmatter- a way to set metadata and such for each source file. I wanted to support frontmatter that let me to set a date and title for each post. It looks like this:

---
title: Post time
date: 2018-05-11
---

# Markdown content here.

There's a really nice and simple existing library for frontmatter called python-frontmatter. I can use this to extract the frontmatter and the the raw content:

import frontmatter


def parse_source(source: pathlib.Path) -> frontmatter.Post:
    post = frontmatter.load(str(source))
    return post

The returned post object has .content property that has the post content and otherwise acts as a dictionary to fetch the frontmatter keys.

Rendering the posts

Now that we have the post content and frontmatter, we can render them. I decided to use jinja2 to place the cmarkgfm-rendered post Markdown and frontmatter into a simple HTML template.

Here's the template:

<!doctype html>
<html>
<head><title>{{post.title}}</title></head>
<body>
  <h1>{{post.title}}</h1>
  <em>Posted on {{post.date.strftime('%B %d, %Y')}}</em>
  <article>
    {{content}}
  </article>
</body>
</html>

And here's the Python code to render it:

import jinja2

jinja_env = jinja2.Environment(
    loader=jinja2.FileSystemLoader('templates'),
)


def write_post(post: frontmatter.Post, content: str):
    path = pathlib.Path("./docs/{}.html".format(post['stem']))

    template = jinja_env.get_template('post.html')
    rendered = template.render(post=post, content=content)
    path.write_text(rendered)

Notice that I store the rendered HTML in ./docs. This is because I configured GitHub Pages to publish content from the doc directory.

Now that we can render a single post, we can loop through all of the posts using the get_sources function we created above:

from typing import Sequence


def write_posts() -> Sequence[frontmatter.Post]:
    posts = []
    sources = get_sources()

    for source in sources:
        # Get the Markdown and frontmatter.
        post = parse_source(source)
        # Render the markdown to HTML.
        content = render_markdown(post.content)
        # Write the post content and metadata to the final HTML file.
        post['stem'] = source.stem
        write_post(post, content)

        posts.append(post)

    return posts

Writing the index

We can now render posts but we should also render a top-level index.html that lists all of the posts. We can do this with another jinja2 template and the list of posts returned from write_posts.

Here's the template:

<!doctype html>
<html>
<body>
  <h1>My blog posts</h1>
  <ol>
    {% for post in posts %}
    <li>
      <a href="/{{post.stem}}">{{post.title}}</a>
    <li>
    {% endfor %}
  </ol>
</body>
</html>

And here's the Python code to render it:

def write_index(posts: Sequence[frontmatter.Post]):
    # Sort the posts from newest to oldest.
    posts = sorted(posts, key=lambda post: post['date'], reverse=True)
    path = pathlib.Path("./docs/index.html")
    template = jinja_env.get_template('index.html')
    rendered = template.render(posts=posts)
    path.write_text(rendered)

Finishing up

All that's left now is to just wire this all up using a main function.

def main():
    posts = write_posts()
    write_index(posts)


if __name__ == '__main__':
    main()

Check this out on GitHub

So the page you're reading now was rendered with this code! You can go and see the full source code for this, including syntax highlighting support, over at theacodes/blog.thea.codes