I spent way too long building this blog

...and must therefore write about it to alleviate the guilt.

#Hiding from Dr. Jekyll

As you can probably tell, this blog has been inactive since the peak of March 2020, where I apparently reacted to the pandemic with a surge in creativity.

Back then, the blog was hosted on GitHub pages, which used Jekyll to build and serve the blog from a set of markdown files. This was fine to start with, but it used a lot of unfamiliar tech (Ruby, Liquid, markdown extensions) that made it difficult to configure and maintain. It soon bit rotted enough to discourage me from writing in the fleeting moments of inspiration.

I have been taking a break from programming for the past few months and as I return from it to start a job search, I thought I should clean up my online presence. So I decided it was finally time to replace the Jekyll setup.

My initial investigation revealed that GitHub now allows you to deploy pages using GitHub Actions, so you can pick from a plethora of static site generators. However, this time I wanted something very minimal that I could jump back into a year from now and not get bogged down reading outdated docs and updating dependencies.

So, I decided it was finally time to enter the rite of passage that is writing a tool to generate your website.

#Making a list, checking it twice

Such projects go a lot better if I have a clear idea of what I want from them. Otherwise, I spend way too much time playing with the tech and not enough building the thing that needs to exist.

I had a rough idea that I wanted a fast, minimal, modern site that introduces myself and hosts my writing. I imagined expanding it out over time to link to more details about my work and side projects.

My previous home page was a small 3d business card¹ that didn't link to my blog². The site and the blog were hosted via separate GitHub repos, where the home page was just a static site and the blog was generated by Jekyll.

There were a number of things I liked about the old blog — posts authored in markdown³, footnotes, math, syntax-highlighting, heading permalinks, dates in urls, autogenerated RSS feed, SEO and OpenGraph tags etc.

So I made a checklist of the features I wanted in the new setup and got cracking.

#First, we prototype...

I first wanted to see how feasible it was to write something that could generate a site with similar functionality without the same complexity as before.

My plan was to find a way to convert the markdown content of posts to html, wrap it in some scaffolding to form a valid html document and save that to a static directory that the public site could be served from. Even better if this could be done automatically in a GitHub Action.

I have been learning Rust lately and using it for Advent of Code, so I briefly wondered if I should use it for this project as well. I decided against it quickly, because so far it hasn't proved great for prototyping and the ecosystem of markdown related tooling was much bigger in JS.

So I started with a node+TypeScript project. I used react-markdown (which uses unifiedjs and plugins) to parse the markdown posts, wrapped them in a component and serialized them to corresponding html files.

For styling, I like the utility file approach⁴ of writing out generic classes like .font-small and using those to style components. We used this approach at Airtable and it felt much more productive than the traditional approach of using CSS. I created one from scratch, allowing that I might switch over to a framework, but didn't want the overhead from the start.

With that, I had a tool that could generate blog posts to my liking and enough confidence that this was worth building out.

#...then draw rest of the owl

A bunch of the remaining work was just setting up the code as a script that could generate the full site with some configurable parameters like the host name (for local testing) and the posts directory (so I could store drafts separately). I'll skip those boring details, but there were a few fun things along the way:

#Deploying to GitHub Pages

I remembered finding the original process for deploying a GitHub Pages website quite confusing. Digging around into the source code of various GitHub actions demystified it a lot. I started with this action that deploys a static directory and modified it to run a build step first. This pretty much worked the first time, partly out of luck because I was placing my generated files in _site/ (a holdover from Jekyll) and the upload-pages-artifact action defaults to picking up the content from that folder.

#Generating preview images for OpenGraph

One of the useful things Jekyll generated was metadata for search engines and social networks to help surface your content properly. Those were fairly easy to figure out since I already had the relevant metadata from posts. While looking at other blogs for some inspiration, I noticed that Guillermo Rauch's blog had an <og:image content="..."> tag as well, which Jekyll did not generate.

On opening the linked URL, I realized that this was meant for the preview image in social networks, such as this one for an issue on GitHub:

Image showing the preview of a Github issue on a social network

This was too charming to not implement. I knew that generating screenshots of websites on the backend is a pain⁵ because it usually involves either downloading a headless browser or a combination of DOM emulation and serialization libraries, neither of which screams maintainability.

However, on a closer look at rauchg's blog, I realized that it was using their Open Graph Image Generation edge function, which in turn uses Satori that has (limited) support for rendering DOM elements on the backend using wasm+JS. This worked like a charm⁶, with the only annoyance being that I had to duplicate some of my CSS as inline styles. I left a little TODO to figure out how to share these styles, perhaps via css-inliner.

#Generating permalinks to headings

One thing I've comed to love from working at and using Quip is deep linking to documents (which lets you do so to any paragraph). It's a great way to share deeply embedded information in context and build out a knowledge graph. GitHub has a particularly nice approach about this where they automatically create anchor links for headings in the document, such as https://github.com/banga/git-split-diffs/tree/main#themes.

With the Jekyll setup, I was using some plugin that required manually writing out ids in curly brackets next to headings, like

# Smart Selection {#smart-selection}

to generate a permalink. I much prefer GitHub's approach. I looked at the plugins for react-markdown but didn't find anything that worked this way, so wrote something based on this suggestion in an issue thread:

react-markdown allows you to override rendering of specific HTML tags via the components prop. I wrote a component that would replace the default renderer for heading elements, had it parse the contents of the heading, generate an id from that and then insert a link next to the heading. With a touch of CSS to only show these links on hover, this was ready to use.

There is a gotcha with these links called DOM clobbering because these links use an id attribute, which automatically exposes said element to the window global. However, since I don't plan on hosting any external JS⁷ or sensitive content here, I'm giving it a pass.

#Rendering math

My last post made heavy use of math using $\LaTeX$ notation, which I wanted to preserve. Thankfully, there are multiple unifiedjs plugins that can parse and render math in markdown. I initially picked remark-math and rehype-mathjax, which rendered correctly on the web.

Later, when I was generating a blog feed, I noticed that the tags used by MathJax were not rendering in feed readers. This took some delicate fixing, because feed readers don't seem to support evicting their caches. So I had to dust off my old The Old Reader and NewsBlur accounts. I replaced rehype-mathjax with rehype-katex, which uses MathML tags that finally have cross browser support in 2023⁸ and seem to render fine in feed aggregators.

#Syntax highlighting code blocks

I shared a few code snippets before, which were highlighted by Jekyll by default using Pygments. I wanted to improve on that by rendering it in VS Code's dark theme, having written a tool to do the same on the terminal. I used react-syntax-highlighter, which delegates to Prism. It's not identical, but it ships with a theme called vsc-dark-plus that replicates VS Code's Dark+ theme. Loading this theme was a bit of a pain in my node ESM setup, partly because the types shipped with react-syntax-highlighter are wrong, but it worked fine after that.

Edit: I have since replaced react-syntax-highlighter with shikiji, which renders accurate VS Code themes.

#Rendering footnotes

My last post really went all out on all the github extensions it could use, which included footnotes. Thankfully, these were easy to support, via the remark-gfm plugin. It actually pulls in a number of features from GitHub Flavored Markdown apart from footnotes, such as strikethroughs, tables and task lists. These will come in handy in the future.

#Generating a blog feed

I use Feedly to follow other blogs and hopefully someone will want to follow mine some day. So I wanted to generate a feed for my posts as well. I sampled the feeds for a few blogs I read recently (v8, Keyhan's blog, Mihai's blog). Between RSS and Atom, I found the Atom format slightly easier to write. Other than the math rendering hiccup mentioned earlier, this worked easily.

#Auto reloading

My last bit of indulgence in this project was to add auto reloading. My workflow so far had been to run the build script in watch mode via nodemon, serve the output using npx http-server ./_site and then hard refresh pages to see my changes.

This worked fine, but I have seen a few too many auto/hot reloading demos to be satisfied with it. A fast feedback loop during development and design makes iterative improvements that much easier. Reading a well formatted web page also beats reading markdown.

The usual approach for this works by having the client talk to a server (usually via a WebSocket), which watches the inputs and sends a message to the client when the relevant inputs change.

I didn't want to add the complexity of setting up my own server, so I took a different approach. My build script writes out a hash file at the end of the build containing the content hash of everything in the output directory. The client polls this file and reloads if the contents change. The only thing it relies on from the server is to not cache the contents of the static directory. This is achieved by passing the -c-1 flag to http-server.

I'm pretty pleased with the result as I type this and hit save.

#Too long, didn't read?

That's ok. This is mostly for my own documentation when I look back and wonder how to get this thing working again. I also found and corrected a couple of issues in the process of writing this.

It also dawned on me as I wrote this down that even a dinky little website like this relies on an absurd amount of engineering and creativity by often thankless volunteers.

So, if you are one of those who share their work for free, thank you. The source for this repository is available at banga/banga.github.io

https://web.archive.org/web/20220203233535/https://banga.github.io/ ↩
https://web.archive.org/web/20230126105442/https://banga.github.io/blog/ ↩
I generally don't like markdown as a storage format or API surface, largely because it's not really a standard, but it works fine for an authorship of one. ↩
I don't know if there's an actual name for this style, is there? ↩
Many web apps do this in their export and print flows, because rendering to an image is often more reliable and easier to customize than client-side printing in browsers. ↩
Thank you vercel for open sourcing this and WASM for making it possible! ↩
In fact, no JS at all so far. ↩
See https://www.igalia.com/2023/01/10/Igalia-Brings-MathML-Back-to-Chromium.html. “This is the first example I know of that a major, major feature is really coming to the Web despite there not really being a business case for the business that normally advance the Web,” said Rick Byers of the Chrome team at Google. Many thanks to the volunteers! ↩

Shrey Banga

banga.shrey@gmail.com