This blog is made using Astro. I’m writing its contents in Markdown or MDX (a combination of Markdown and JSX). Whenever I put a code block into a page, like this:
```js
// the hello world program
console.log('Hello World');
```
Astro renders that code block, using Shiki as the highlighting engine1:
// the hello world program
console.log('Hello World');
Shiki supports, from COBOL to Git Rebase Messages, a myriad of languages. Among those languages i found ANSI.
ANSI?
In this case ANSI refers to ANSI escape codes. They are special combinations of characters that command line programs can use to control terminal behaviour. They can change cursor position, style text, or (if allowed) even copy contents to your clipboard. For web embedding purposes I mainly care about the text styling aspect.
Coloring or setting the weight (light, bold) of output is often used to make a programs’ output easier to understand. If we compare the base output of ls:
Terminal window
$ ls
dir1 dir2 file1 file2
With the colored output using the --color flag:
Terminal window
$ ls --color
dir1dir2 file1 file2
we can discern normal files (uncolored) from a directory (blue) at a glance. So if we embed terminal output in our website we of course want to keep this extra information.
Getting the underlying ANSI escape codes
Sadly copying these escape codes out of our terminals isn’t as easy as I hoped it to be. When copying something from a terminal what we get most of the time is just the text with all ANSI escape codes stripped. This makes sense since most of the time when embedding or sharing our terminal output we have no environment that can interpret these.
We could of course do some trickery by capturing our shell outputs using a terminal multiplexer like tmux and then show the non-printing characters from our capture using cat with the -v option. That is way too many steps though. I want to just copy and paste, not remember to run tmux beforehand and all that fuss.
Luckily Kitty comes to the rescue! Kitty is a cross-platform terminal that distinguishes itself with a wide range of features. Like in this case supporting copying the terminal output raw including ANSI escape codes.
To allow us to do that we just have to add a shortcut to the config. In my case i can replace the default copy shortcut since im not using Kitty for anything else.
~/.config/Kitty/Kitty.conf
map ctrl+shift+c copy_ansi_to_clipboard
Now i can just copy from Kitty, paste into my ansi marked code block and I’m golden:
Terminal window
]133;A\[92mgelaechter[39m@Arch [32m~/D/Test[39m> ls --color
See those blue marked ESC parts? Every ANSI escape code starts with the ASCII Escape symbol. It tells the terminal that whatever comes next is a command and to be interpreted. It would not be displayed as a visible symbol in the browser which is why i replaced it with the String ESC.
The type of ANSI code responsible for text formatting are called a “Select Graphic Rendition”. Let’s look at the first one:
ESC[22;1;34m
Select Graphic Renditions are identified by the [ that follows the Escape symbol.
ESC[22;1;34m
After the terminal now knows that it’s dealing with a Select Graphic Rendition, the specific commands follow. In the first escape code these are 22, 1 and 34:
ESC[22;1;34m
They tell the terminal:
22: Reset bold mode (all following text is no longer bold)
1: Set bold mode (all following text is now bold)
34: Set the foreground color blue (all following text itself is now blue)
The m marks the end of the commands and with that the end of the Select Graphic Rendition sequence:
ESC[22;1;34m
So what the first ANSI escape command ESC[22;1;34m actually does is: Deactivate any previous boldness, then set the text blue and bold.2
So ls can tell your terminal that it wants to render the following dir1 blue and bold.
What my fish shell is producing
If you now compare the escape code I showed in the example with the first escape copy of the text I copied into the code block:
ESC]133;AESC\
You might notice an important difference. Where we would expect a [ there is instead a ]. That is because the ANSI escape code were dealing with here is not a Select Graphic Rendition. Instead, the ] marks an Operating System Command (OSC).
So it would seem that while the naive approach might work for most setups, the fish shell, which i am using, adds some of these Operating System Commands and Shiki can’t deal with that.
Why is fish doing that though? Operating System Commands are less standardized than control sequences. They are mostly used to allow controlling the terminal itself. For example did xterm allow you to set the window title and icon using OSC 0. So if my command would start ESC]0;My TitleESC\ the xterm window would now be called “My Title”.
The OSC 133 that fish inserts seem to be a proprietary OSC for the FinalTerm terminal emulator which is no longer maintained. FinalTerm was meant to be context aware and the OSC 133 sequences were meant to provide it with this context.
OSC 133 A told it where the shell prompt started. OSC 133 B told it where it stopped. If you are interested in other OSCs the iTerm2 documentation has a lot of them.
Getting rid of OSCs
Since Shiki apparently can’t deal with OSCs the simplest course of action would be to simply get rid of them. Since all OSCs start with ESC] and end with ESC\ we can use Kitty’s scriptability and sed together with a regex, to remove these from the clipboard whenever we copy Kitty’s output.
Our regex would look something like this:
\x1b\][^\x1b]*\x1b\\
Matching the following:
ESC, expressed here as hexadecimal 0x1B or \x1b in regex syntax
], it needs to be escaped, so \]
[^\x1b]* matches anything except ESC unlimited times, meaning we seek forward until we reach the ESC
ESC\ (\x1b\\) then we match the ESC\ which acts as OSCs terminator
NOTE
I’m explicitly matching non-ESC symbols in 3. instead of just matching everything in-between lazily (.*?) because sed doesn’t support lazy matching. I could of course use pearl but why bother if this works.
Effectively this would match everything starting with ESC], ending with ESC\ as well as everything between it. So we configure Kitty to remove every match using sed after copying to the clipboard:
~/.config/Kitty/Kitty.conf
# Disable clipboard
clipboard_controlwrite-clipboardread-clipboard
# Define the regex and change the clipboard
envREGEX=\x1b\][^\x1b]*\x1b\\
map ctrl+shift+ccombine:
\copy_ansi_to_clipboard:
\launchsh-c
\'kitten clipboard --get-clipboard |
\ sed -e "s|${REGEX}||g" |
\ kitten clipboard'
What’s happening here:
combine : allows us to execute multiple Kitty actions separated by a colon (:).
We execute copy_ansi_to_clipboard first, then execute the launch action, which runs a command.
We run a new shell which first uses Kitty’s clipboard kitten to fetch the system clipboard. Kittens are python scripts for interaction with Kitty, they are system agnostic and if someone uses Kitty we know that they can use kittens!
So pipe the clipboard contents into sed -e executing the following expression. Here is a breakdown:
s: substitute
|: is our chosen delimiter
${REGEX}: is what were matching
|| two delimiters in between is what were replacing with (in this case noting)
g globally, meaning replace every occurrence, not just the first
So yeah, essentially just replacing every occurrence of our regex with nothing. Finally, we use the clipboard kitten again to write sed’s output to the system clipboard again.
And that’s it. Now I can copy my terminal output and paste it into my code block.
Terminal window
gelaechter@Arch ~/D/Test> ls --color
dir1dir2 file1 file2
Conclusion
Quite a lot of extra work, but we learned about ANSI escape codes. Oh, and also the main reason for doing this: This has great benefits for typesetting!
Compared to screenshots we get actual selectable, copyable text that is rendered by your browser. Since Shiki is now interpreting the color codes it can also change its colors dynamically. For example when changing to dark mode.
Footnotes
My blog additionally uses Expressive Code to make the code blocks a little nicer than what base Astro provides. ↩
The reason for deactivating previous boldness in every escape code is probably a decision made in the implementation of ls to ensure a clean start state. ↩