Middleware for Kagi’s Small Web RSS Feed

Published August 15, 2024

2024-08-16 Update: in the most meta of meta circumstances this post itself is on Kagi’s feed: https://reader.miniflux.app/share/2653120942386c74541e48cb788c632eaf0d9372

tl;dr I made a fragile proxy of Kagi’s small web RSS feed because by default it’s too big for my RSS reader.

Here’s the background on the why for all this: I wanted to subscribe to the RSS feed¹ for Kagi’s Small Web but by default it’s way too big for my RSS reader (Miniflux) to handle. It clocks in at about 25MiB.

As a quick aside: if you’re unfamiliar with the “small web” it typically refers to non-commercial websites created by individual humans who do it because they can and want to, not for any financial gain. It typically invokes nostalgia of the early Internet days before advertising was wide spread. Grassroots. Community. In particular, emphasizing unity in community. Read more on Kagi or Ben Hoyt’s website.

Kagi lists this on their API docs but unfortunately I couldn’t find an obvious way to provide any sort of pagination as part of the request. So, it becomes a quickie DIY project.

Well, I’ve recently discovered Val Town so I thought, why not, let’s keep the party going. Here’s the val I created: https://www.val.town/v/pinjasaur/smallweb

And here’s the code in its entirety:

import { DOMParser, Element } from "jsr:@b-fuze/deno-dom";

const getText = async url => (await fetch(url)).text();

export default async function(req: Request): Promise<Response> {
  if (req.headers.get("referer")?.includes("val.")) {
    return Response.json({ error: "Cannot access from val.town" }, {
      status: 400,
    });
  }
  if (new URL(req.url).pathname === "/favicon.ico") {
    return new Response(null, { status: 404 });
  }
  const params = new URL(req.url).searchParams;
  try {
    const feed = await getText("https://kagi.com/api/v1/smallweb/feed/");
    const parser = new DOMParser();
    const atom = parser.parseFromString(feed, "text/html");
    const $feed = atom.querySelector("feed");
    const entries = $feed.querySelectorAll("entry");
    Array.from(entries).forEach($entry => {
      $feed.removeChild($entry);
    });
    Array.from(entries).slice(0, parseInt(params.get("limit"), 10) || 25).forEach(($entry: Element) => {
      $feed.appendChild($entry);
    });
    return new Response(
      feed.split("\n")[0]
        + $feed.outerHTML.replace(/\s{2,}/g, "\n").replace(/<link href="(.*)" ?>/g, "<link href=\"$1\"/>").replaceAll(
          "&nbsp;",
          "&#160;",
        ),
      {
        headers: {
          "content-type": params.get("raw") ? "text/xml" : "application/atom+xml",
        },
      },
    );
  } catch (err) {
    return Response.json({ error: err.message }, {
      status: 400,
    });
  }
}

It’s incredibly fragile, but it technically works.

Noteworthy details:

It bails on self-requests e.g. when browsing Val Town to prevent it from automatically downloading the RSS feed to the browser.
I ended up checking for a request to /favicon.ico and returning a 404 to prevent an empty icon in the Miniflux UI.
I spent way too long passing the feed through RSS feed validators to get it to a “good enough” state.
Using the DOMParser API to do some “DOM” manipulation. Also, some incredibly hacky regex to make a <link> into an XML-compliant <link />.
And finally, the peace-da-resistance as they say², I added a ?limit=n query parameter to optionally only return the n most recent entries in the feed. I defaulted this to 25, pretty much exclusively because that’s what the page size Lobsters uses. Also, I added an optional ?raw=1 too so it could render prettier in the browser.

Again, fragile, but it works. I’ve been dogfooding it myself for the past couple days, so it really can’t be that bad, right?

I opted to create an issue on the GitHub repository. We’ll see what the verdict is—I’d personally love to take a stab at implementing it myself. I’d much rather have a feature like this in the upstream API versus some hacky middleware that I scraped together.

Go forth and prosper by supporting the small web & their collective RSS feeds.

🖖

Meta detail: this website itself has been part of Kagi’s index since the beginning.↩︎
No one says this. It’s quite literally only me. Additionally, it’s correctly spelled pièce de résistance.↩︎

I love hearing from readers so please feel free to reach out.

Reply via email • Subscribe via RSS or email

Last modified August 19, 2024 #js #programming #syndication #web #hack

🔗 Backlinks

Fixing Twitch’s Broken iCal Feeds