How to Build a Fast HTML Parser Using Regex and TypeScript

Elson
10 min readNov 13, 2023
Photo by Jay Zhang on Unsplash

While working on a side project, the need to parse HTML came up, and to save time, I tried the fastest HTML parsers I could find. After fighting and trying to hack them, I realized I needed a custom or super customizable one to fit all project needs. Unfortunately, I had no luck. So, I created one.

I thought it was a simple enough thing to do…

The Motivation

For my specific project, I needed something fast, which was easy to find, but I needed to be customizable enough. However, everything I found mainly failed in two areas:

  • They offered no way to tap into nodes while they were being parsed — That’s something I desperately needed.
  • They offered no ability to specify custom API for the parsed result, forcing me to learn something new they came up with or remain stuck with really non-performant APIs. — This ability would allow me to adapt the parser to the project, not vice versa.

Some offer customizations that often come with performance loss — I wanted both performance and customization. Additionally, I needed it to work in any JavaScript runtime environment, and because I was going to use it in a client library, it needed to be light.

--

--

Elson

Software Engineer sharing knowledge, experience, and perspective from an employee and personal point of view.