After a couple dependency-free versions, I ended up adding the neaderthal library to do faster matrix math. The different versions I wrote along the way are on github, in case they’re helpful for anybody else who wants to do this in Clojure.
Neural networks are surprisingly easy to get started with. There’s significantly more “magic” inside a good concurrent queue implementation, for example, than inside a basic neural network to recognize handwritten digits.
For example, here’s the “hello world” of neural networks, a widget to recognize a hand-drawn digit:
Loading JavaScript...
See this widget at github.com/matthewdowney/clojure-neural-networks-from-scratch/tree/main/mnist-scittle
And here’s the code for the pixel array -> digit
computation^{1}:
(defn sigmoid [n] (/ 1.0 (+ 1.0 (Math/exp (- n)))))
(defn feedforward [inputs weights biases]
(for [[b ws :as _neuron] (map vector biases weights)]
(let [weighted-input (reduce + (map * inputs ws))]
(sigmoid (+ b weighted-input)))))
(defn argmax [numbers]
(let [idx+val (map-indexed vector numbers)]
(first (apply max-key second idx+val))))
(defn digit [pixels]
(-> pixels (feedforward w0 b0) (feedforward w1 b1) argmax))
It’s striking that such a complicated task works without intricate code or underlying black-box libraries.^{2} I felt kind of dumb for not having known this already!
The three most helpful resources for me were:
3Blue1Brown’s video series on neural networks, with visualizations and intuitive explanations. Good initial context.
Michael Nielsen’s neural networks and deep learning tutorial, which uses Python and numpy.
Andrej Karpathy’s intro to neural networks and backpropagation, which is pure Python (no numpy), and was kind of a lifesaver for understanding backpropagation.
In retrospect, to get started, I’d recommend reading the first part of Nielsen’s tutorial, skipping to the Andrej Karpathy video, and then solving MNIST from scratch using those two things as references, before coming back to the rest of Nielsen’s material.
I also went through Dragan Djuric’s impressive and erudite deep learning from scratch to GPU tutorial series, but I can’t say I’d recommend it as an introduction to neural networks.^{3}
I’m glad I decided to start from scratch without any external libraries, including ones for matrix math.
I do, however, wish I’d watched Andrej Karpathy’s video before getting so deep into Nielsen’s tutorial, especially because of the backprop calculus^{4}, which I struggled with for a while. Karpathy’s REPL-based, algorithmic explanation was much more intuitive for me than the formal mathematical version.
My approach was to:
The training time for one epoch of MNIST was 400 seconds in the first two versions, 5 seconds in the third (on par with the Python sample), and down to 1 second in final version.
I’m glad I broke it down like this. Would do again.
Before implementing the backprop algorithm, I built some unit tests for calculating the weight and bias gradients given starting weights and biases and some training data, and this turned out to be enormously helpful. I used Nielsen’s sample Python code to generate the test vectors.
Finally, invoking numpy via libpython-clj at the REPL was useful for figuring out the equivalent neanderthal expressions.
A neuron in a neural network is just a function [inputs] -> scalar output
, where the output is a linear combination of the inputs and the neuron’s weights, summed together with the neuron’s bias, and passed to an activation function.
Much of the magic inside of neural network libraries has less to do with cleverer algorithms and more to do with vectorized SIMD instructions and/or being parsimonious with GPU memory usage and communication back and forth with main memory.
Neural networks can, theoretically, compute any function. And a more readily believable fact: with linear activation functions, no matter how many layers you add to a neural network, it simplifies to a linear transformation.
But, the activation function is not necessarily all that squiggly — ReLU is just max(0, x)
and it’s widely used.
Since I used Scittle to embed the Clojure code in this page, you can browse the source file directly. ↩
And sure, this is a rudimentary network architecture, and there’s a sense in which “the real program” is the magic weights and biases numbers in the w0
, w1
, b0
, and b1
vectors, but it turns out that you can also write the training code from scratch to find those vectors without too much trouble. ↩
It is definitely an introduction to memory reuse tricks and GPU programming, for someone who already has a strong grasp of linear algebra, and wants to reinforce or deepen existing understanding of neural networks and relevant performance optimization. Which is crucial for deep learning in practice, but is a lot to take in at first. ↩
Also, on the indexes in Nielsen’s neural network backpropagation algorithm — the style in the Python sample starting on line 101 was hard for me to parse, with negative indexes and iterations using three indexes each. I found it helpful to rewrite like this:
# compute the difference between the output and the expected output
# this is the last layer's error
error = self.cost_derivative(activations[-1], y) * sigmoid_prime(zs[-1])
# weight and bias gradient vectors, same shape as the network layers
nabla_w = []
nabla_b = [error]
# the activations list has inputs prepended, so it's longer by 1
activations_for_layer = lambda layer_idx: activations[layer_idx+1]
# iterate backwards through the layers
for layer_idx in xrange(len(self.weights), 0, -1):
# compute a change in weights using the previous layer's activation
prev_activation = activations_for_layer(layer_idx-1)
nabla_w.insert(0, np.dot(error, prev_activation.transpose()))
# if there is a previous layer, compute its error
if layer_idx > 0:
this_layer_weights = self.weights[layer_idx]
prev_layer_weighted_inputs = zs[layer_idx-1]
sp = sigmoid_prime(prev_layer_weighted_inputs)
error = np.dot(this_layer_weights.transpose(), error) * sp
nabla_b.insert(0, error)
return (nabla_b, nabla_w)
Where \(f^*\) is the optimal portfolio allocation for probabilities of winning (\(p\)) and losing (\(q\)), and the win (\(b\)) and loss (\(a\)) fractions.
So if a stock trades at $100 and has a 60/40 shot of either climbing to $125 or falling to $75, the Kelly criterion says the optimal allocation \(f^*\) is 80% of the entire portfolio: \(f^* = \frac{0.6}{0.25} - \frac{0.4}{0.25} = 0.8\). In practice, people tend to cut this in half.
But if there is a 2% chance it goes to zero, a Monte Carlo simulation shows an optimal allocation closer to 0.39.
There isn’t a comparably simple formula to take a 2% chance of total loss into account, but this is a good approximation for a multi-outcome situation:
\[f^* \approx \frac{\mu}{\mu_2 - \mu^2}\]Where the expected return \(\mu = \sum_{i} p_i r_i\) and the expected squared return \(\mu_2 = \sum_{i} p_i r_i^2\).
This formula’s intuition is a similar to the two-outcome version’s, since it’s also a ratio of edge to odds. The variance \(\mu_2 - \mu^2\) increases with tail risk.
I got this by irresponsibly inputting the probabilities and returns into the investment fraction formula under assumptions of geometric Brownian motion in place of a single asset’s historical behavior, and then checking it against Monte Carlo in various situations.
For the same situation, take a series of probabilities \(p = [0.60, 0.38, 0.02]\) and corresponding returns \(r = [0.25, -0.25, -1.00]\) to get:
\[\mu = p \cdot r = 0.035\]and
\[\mu_2 = p \cdot r^2 = 0.08125\]so \(f^* \approx 0.4373\). Close, but not quite a perfect fit with the Monte Carlo simulation.
A simple, analytical solution feels just out of reach. The two-outcome Kelly formula isn’t terribly hard to derive from first principles, so maybe the three-outcome version isn’t that much harder.
To derive the two-outcome formula, start by taking the compound growth rate. The compound growth rate for some bet fraction \(f\) is:
\[g(f) = (1 + f b)^p \cdot (1 - f a)^q\]where the portfolio which grows by fraction \(b\) with frequency \(p\) and decreases by fraction \(a\) with frequency \(q\). To find the bet fraction which maximizes the compound growth rate, take the first derivative \(g'(f)\), set it equal to zero, and solve for \(f\)^{1}.
The compound growth rate for the three outcome version, with some risk of ruin \(r\), is:
\[g(f) = (1 + f b)^p \cdot (1 - f a)^q \cdot (1 - f)^r\]Taking \(G(f) = log(g(f))\) you get
\[p log(1 + f b) + q log(1 - f a) + r log(1 - f)\]The derivative \(G'(f)\) is:
\[\frac{p b}{1 + f b} - \frac{q a}{1 - f a} - \frac{r}{1 - f}\]So far so good, but the final step of finding a neat formula for \(f^*(p,q,r,b,a)\) is a treacherous endeavor.
On the other hand, \(G'(f^*) = 0\) is tractable if you pass in all of the other parameters.
Revisiting the 60/38/2 example:
\[G'(f) = \frac{(0.60)(0.25)}{1 + 0.25f} - \frac{(0.38)(0.25)}{1 - 0.25f} - \frac{0.02}{1 - f}\]Setting \(G'(f) = 0\) and solving for \(f\) is not the most pleasant thing, but you get:
\[f = \frac{93 \pm \sqrt{3049}}{100} = 0.3778 \text{ or } 1.4822\]Madness? Maybe, but the first result is extremely close to the Monte Carlo simulation’s 0.39.
The nice thing about the two-outcome formula is that you can do it in your head.
Probably there is somebody who can solve the \(f^* \approx \frac{\mu}{\mu_2 - \mu^2}\) heuristic in their head, but the precise representation is way too much.
One way to think about it is that if you’re going to use a computer anyway, you might as well use the precise representation. On the other hand, it’s unwieldy unless the number of probabilities and outcomes is known in advance, and involves an algebra solver in any case. So the heuristic solution has a certain appeal.
Or you could just cut the two-outcome formula in half and call it a day.
Take the logarithm \(G(f) = log(g(f))\) and go from there, to make the calculus easier. The maximum point for \(G(f)\) is the same as the maximum point for \(g(f)\). This has nothing to do with the supposed logarithmic subjective utility of wealth, but they are sometimes confused. ↩
The Kelly criterion finds the bet size to maximize growth over a series of bets. I won’t introduce it in detail, since others have already done so (see here, here and here).
Never an armchair risk-taker, when E.O. Thorp was working on these ideas, he started by counting cards and wagering on equines before generalizing, and formalized the approach of betting less than the theoretical optimum. Common sense agrees. This is called “fractional Kelly” generally or “half Kelly” when betting exactly half the optimal amount.
I started simulating these kinds of bets, interested in something like: what properties would have to be true of ourselves and the things that we tend to bet on^{1} for half Kelly to be a useful heuristic?
I ended up simulating a few things in turn:
These topics have been covered in (perhaps too much) depth in the literature, and in bits and pieces on blogs. My small addition is a series of interactive simulations to test your intuition against.
If you bet on an outcome that you think is ~70% likely, one way to think about the winning probability is as a number drawn from a normal distribution centered around 70%, with some standard deviation, say 5%.
\[p \sim \mathcal{N}(\mu = 0.70, \sigma = 0.05)\]In which case, there is uncertainty, and, for example, 68% of the time your winning probability is between 0.65 and 0.75. For reference, the Kelly bet (f*) for 70/30 chances is 0.40, but for 65/35 it is 0.30.
If overbetting is worse than underbetting (as Thorp shows; more on that below), then increasing uncertainty reduces the optimal bet size, even if you have the right mean. This looks like a good reason for uncertainty to affect bet sizes.
But have a look at this Monte Carlo simulation of 100 portfolios, each making 100 wagers with chances of winning drawn from a normal distribution around p(win). Drag the bet size and uncertainty bars around.
Loading JavaScript...
Setting the uncertainty σ to 5% moves the optimal bet size from 0.40 to 0.38. At σ = 20%, it decreases to 0.36. These are relatively small moves for significant changes in uncertainty. (Click the green links to update simulation parameters.)
Uncertainty matters, but apparently not that much. Only setting σ to extremes, like 50%, produces a significant change in optimal bet size^{2}.
Thorp’s explanation is that it is not primarily uncertainty, but a background tendency to overestimate the chances of winning, and therefore to overbet, which justifies a partial Kelly strategy.
He shows that overbetting is indeed worse than underbetting, and that betting half-Kelly offers protection against a negative growth rate (from overbetting) at the cost of reducing growth rate by, in this case, <= 25%.
In other words, there’s asymmetry in your favor when reducing bet size from full to half of the theoretical optimal.
Pages 28 and 29 of “The Kelly criterion in blackjack, sports betting, and the stock market”.
I’m convinced that uncertainty is not as important as I thought, but only weakly convinced that the apparent usefulness of partial Kelly is due to systematic overestimation of winning probability.
If everyone were underestimating a permeating, Talebian risk of ruin, that might explain the appeal of partial Kelly betting.
Take Credit Suisse’s AT1 securities. They were bond-like things which paid a 9.75% coupon with very high probability, but they were also (apparently) the first backstop for depositors if bank capital were to become insufficient. So a bet size based on Fermi estimates of the relative value of the AT1s in different interest rate environments would require serious modification to account for even a 1% risk of going to zero.
Or take a stock whose price is 100. If you think there’s a 60/40 shot it either climbs to 125 or falls to 75, the Kelly bet size is \(f* = {0.60 \over 0.25} - {0.40 \over 0.25} = 0.8\). But factoring in a a 1% risk of ruin yields an optimal bet of 0.46^{3}, and a 2% risk of ruin further decreases it to 0.39.
Loading JavaScript...
This is more persuasive. However, it depends on a combination of low assumed downside risk and proportionally small actual risk of ruin. So while I think risk of ruin is important, I’m not totally convinced it’s the main driver of the apparent usefulness of fractional Kelly strategies. Plenty of Kelly bets are made assuming a total loss in the downside case anyway, and those bettors still utilize fractional Kelly strategies.
Here’s another explanation: maybe our revealed preference is not to maximize the growth of the median portfolio, but to maximize the growth of the 10th percentile portfolio^{4}.
In a Monte Carlo simulation of portfolios following the same strategy, the mean is important, but one’s true preference might be closer to something like “I want to follow a strategy where I’d make money in 9 / 10 hypothetical worlds”.
Take the same 70/30, even payout bet, this time optimizing for percentiles other than the 50th:
Loading JavaScript...
Drag the nth percentile control around. Optimizing for the 10th percentile, or the 20th, yields a very different bet size than optimizing for the 50th percentile.
Full Kelly has an interesting property: there is an X% chance of your bankroll dropping to X% of what you started with^{5}. A 50% chance of a 50% drawdown is a lot to stomach. Maybe we’d rather not have optimal growth.
In his post on the Kelly criterion, Zvi notes that full Kelly is only correct if you know your edge and can handle the swings. He also notes that you don’t know your edge, and you can’t handle the swings.
This is a compelling explanation for the fractional Kelly heuristic, because it explains large downward adjustments in the bet fraction. Here too, the adjustment depends on the odds ratio, though:
This makes sense, because the distribution of final returns has higher or lower variance depending on whether the bets have higher or lower variance.
To get a better feel for this dynamic, consider a 3d plot of optimal bet sizes for wagers with for the same expected value (EV) and different probabilities of winning, across three surfaces with different EVs.
So for example the point at p(win) = 0.75 on the 1.5 EV surface represents an even bet (you 2x your wager if you win, 0x if you lose), but at p(win) = 0.5, the odds change such that you 3x if you win, to maintain the same EV.
Loading JavaScript...
One way to think about the Kelly criterion is as recognition of the slope along the bet size ⇔ p(win) curve for a fixed expected value. Perhaps some part of the driving intuition behind fractional Kelly, beyond uncertainty about where one sits on the first curve, is the recognition that there is also slope along the percentile axis.^{7}
I came away from this downgrading the importance of pure uncertainty relative to other sources of risk, and above all, with the impression that fractional Kelly is overdetermined. Kelly betting is a fascinating topic, and I enjoyed reading about it in:
Obviously the implications go beyond betting in the strictly conventional sense; any situation that involves something ventured and something gained is a wager. ↩
Since \(\mu = 0.70\), setting \(\sigma = 0.50\) means that the number drawn from the distribution is often greater than one. In any case, the simulation must bound the probability inside \([0, 1]\), and setting the standard deviation like this introduces some skew. The probability at +σ is better by 0.3, but at -σ it’s worse by 0.5. So even at high levels of uncertainty, it’s reasonable to imagine that the effective change to the mean, not the uncertainty, is the cause of the change in optimal bet. ↩
The return chart gets a little jagged here, so it’s hard to say for sure. Likely in the range of 0.45 - 0.60. Increasing the number of simulated bets and portfolios would help, but I’d rather spare your browser :). The 0.39 bet size at a 2% risk of ruin looks much more precise. ↩
Or, realistically, to maximize the growth of the median portfolio, subject to the constraints that the 10th percentile portfolio not (1) lag by too much or (2) lose money. Here the 10th percentile stands in for some lower-than-median percentile return; it’s not clear to me that one of e.g. 10th vs 20th has a strong intuitive appeal that the other lacks. ↩
Over a long enough series of bets, etc. See stat.berkeley.edu/~aldous/Real_World/kelly. ↩
That is, you wager 1 and are returned 5 if you win, which corresponds to \(b = 4\) in
\[f* = \frac{p}{a} - \frac{1-p}{b}\]where \(a = 1\) is the fraction of the wager lost and \(b\) is the fraction won, in addition to the return of the initial wager. ↩
As a thought experiment, if you took half Kelly as dogma, and assumed that downside risk mitigation explained its utility, you could find the surface on this plot that intersects each EV surface wherever the optimal bet size is half Kelly. In this highly stylized version of reality, the intersection would indicate what kind of world we’re likely to live in (not one with EVs frequently much larger than 1.5, or preferences to optimize all the way up to the mean, it would appear).
Unfortunately the precision of the “half” modifier is not high enough, and the effects of other factors not small enough, for this to be useful. ↩
The repository is here: https://github.com/matthewdowney/rendergpt.
The extension is still pending review by the Chrome Web Store, but anyone who wants to use it can try it out by following the instructions in the repository’s README.
It has been particularly useful to me for drawing SVGs.
ChatGPT describes the functionality (kudos to anyone who can guess the prompt I used):
This wild creation sniffs out code blocks in the ChatGPT conversation, adding a “render” button to those tagged as HTML, JavaScript, or PlantUML.
The button morphs the code block into an <iframe>
displaying the code,
allowing you to mix and match your sources. HTML, JavaScript, and CSS code
blocks can intertwine like a psychedelic dance.
Ask for tweaks to JavaScript or CSS, and you can often witness the beast spitting out just the function or the particular CSS needed for modification. Add these new elements to the initial iframe, and watch your creation come to life, one step at a time.
This may have been born out of pure insanity, but it has served me well. And now, as the shadow of OpenAI’s plugins grows, I eagerly anticipate the polished creations that shall emerge from the minds of others, taking us further down the rabbit hole.
]]>Since SBF is allergic to thinking through bet sizes[1][2] (a fact that once struck me as discordant with his Bayesian gestalt) — or maybe because I’ve been indulging in a slow read through Red-Blooded Risk — the Kelly Criterion has been on my mind, just like everyone else’s.
Last week, claims trading at 10¢ seemed undervalued in light of the seizure of 3.5bn by the Bahamas regulator. But of course the 3.5bn turned out to be largely fabricated. A solid number requires liquid assets, which is both obvious and funny. If the assets are not liquid, they are not solid.
This week’s recrudescence is the purported recovery of 5bn, announced in a US bankruptcy court. Entertain me in some hand waving towards a 45¢ recovery if true, 10¢ recovery if false. At 15¢ per claim, the implied odds of the 5b figure’s veracity are 1:7^{1}.
That seems low. I’d bet (but how much?) that it’s closer to a coin toss, in which case, the Kelly bet would be 125%^{2}, a figure which feels impossibly large. Though the rule of thumb is to cut it in half, and if the claims trade up to 20¢, it diminishes to a comparatively sober 60%.
I also noticed that it’s possible to achieve similar changes in bet size by assuming, for example, a 5¢ rather than 10¢ recovery in the pessimistic scenario, without changing the expected value by more than a few cents.
Anyhow, I made a Monte Carlo simulator for a three-outcome version of this problem, more to develop intuition around how ternary outcomes and small parameter changes affect optimal bet size than for any ideas around FTX claim betting, per se, since the edge for this particular trade is probably dominated by information asymmetry.
Instructions: adjust the probabilities and claim prices. Click “Optimize bet size” to find the bet size with the highest median return^{3} among the simulated portfolios. Note that increasing the simulated number of bets increases the time horizon that the strategy plays out over, and affects the proportion of portfolios that end up winning in the long run.
Assume that FTX has 10b in outstanding liabilities. There is 1b of cash already recovered.
Optimistically, the 5b of liquid assets (including the original 1b of cash) plus the less-liquid assets sum to USD 5.5b. Pessimistically, the original billion stands, but the 5b is a fantasy, and liquidators manage to scrounge up another 1b, for a recovery of USD 2b.
So, call it 45¢ or 10¢ on the dollar, assuming 10¢ on the dollar goes to the trustee either way. Then the probability weights which cohere with the market price are: \((10¢ * {6 \over 7}) + (45¢ * {1 \over 7}) = 15¢\).
Note the many sources of possible error, here. The outstanding liabilities might be more than 10b. The probabilities might be wrong. The trustee might charge more than usual for such an exotic case. Etc. ↩
If you pay 15¢ with a 50 / 50 shot of getting either 10 or 45¢ back, you either lose 5¢ or gain 30¢, and you compute the Kelly bet \(f*\) with:
\[f* = {p_{win} \over 5¢:15¢} - {p_{lose} \over 30¢:15¢} = 150\% - 25\% = 125\%\]Or at a 20¢ claim price with the same payouts:
\[f* = {50\% \over 10¢:20¢} - {50\% \over 25¢:20¢} = 100\% - 40\% = 60\%\]Equivalently, a firm whose whole portfolio was on FTX might be justified in selling 40 to 20% (half to full-Kelly fractions) of their claims even if they’re undervalued at 20¢, which is a neat way to explain how market prices might be too low in the presence of involuntary, concentrated positions, even with total information symmetry.
And of course betting half-Kelly is recommended because “the Kelly strategy marks the boundary between aggressive and insane investing” and all that. Though in this situation one might be sincerely tempted to go out in the same manner that FTX lived, and YOLO it. There are certainly philosophical reasons to do so, in addition to the dreary pragmatic ones. ↩
You could also optimize for other things, like how many of the simulated portfolios have a positive return after some number of bets, which yields different results. ↩
A TypeScript library with Python, Java, and C# bindings, which represents cloud resources as objects and defines relationships among them. This library is used to construct a tree of resources, which can be “synthesized” into a cloud assembly. The cloud assembly is a self-contained directory that includes the CloudFormation templates and supporting assets, which collectively represent your infrastructure.
A CLI tool for working with the cloud assembly directory. The CLI tool can deploy cloud assembly to AWS and update or destroy deployments, all via AWS CloudFormation. It also invokes the program that you wrote to build the cloud assembly, although this function is ancillary.
Since there are Java bindings (albeit generated from the TypeScript source), you can use Clojure to build the cloud assembly that the CLI tool then uses as input.
I considered some other approaches for orchestrating non-trivial AWS deployments, and came away with the following impressions:
Clojure usage is broadly similar to the Java usage described in the docs:
$ mkdir cdk-app
$ cd cdk-app
$ cdk init --language java
$ rm -rf src/ pom.xml
Add a Clojure deps.edn file referencing the Java AWS CDK
{:paths ["src/"]
:deps {org.clojure/clojure {:mvn/version "1.11.1"}
software.amazon.awscdk/aws-cdk-lib {:mvn/version "2.33.0"}}}
(ns core)
(defn synth
"Synthesize cloud assembly to the dir at the CDK_OUTDIR env var."
[& _])
{"app": "clj -X core/synth", ...}
Now if you cdk synth
, you’ll see an error that there’s nothing in the cdk.out/ directory. At this point, just follow the AWS CDK Java documentation to fill in the body of core/synth
with some code that will generate cdk.out, and you’re good to go.
See this code sample [github gist] for more information. It’s from my project, and out of context, so you might have to read between the lines a bit.
From here, I followed along with this tutorial from Nathan Peck, translating his TypeScript examples to Clojure while filling out the synth function. The final version of his TypeScript CDK code is here.
Note that, while one could translate literally from TypeScript and subclass the Java CDK classes with proxy
…
(proxy
[software.amazon.awscdk.App]
[(-> (software.amazon.awscdk.AppProps/builder) (.build))])
the recommended Java idioms will of course feel nicer.
(ns core
(:import (software.amazon.awscdk App Stack)))
(defn synth [_]
(let [app (App.)]
; Register a stack under the App root
(Stack. app "MyStackId")
; ... register more things under the created stack ...
; Synthesize the app (the root of the tree of constructs) to cdk.out
(println "Synthesized to:" (.getDirectory (.synth app)))))
Instead of cdk synth
, you can build a cloud assembly from the REPL
;; In this case, it's helpful to set the output dir explicitly
(let [app (App. (-> (AppProps/builder) (.outdir "cdk.out") .build))]
(Stack. app "MyStackId")
(println "Synthesized to:" (.getDirectory (.synth app))))
and use the –app flag to point the CDK CLI at an existing directory and skip the synth stage.
$ cdk list --app cdk.out/
I think the best case would be the ability to do this from Babashka. This might require a custom pod, though.
After playing around a bit, I found it extremely helpful to read the documentation on the CDK’s core concepts.
Notes from the documentation
(reduce + bins)
is ~1
.
Consequently, programmers are very good at discretizing continuous mathematics. But the inverse is also valuable!
Sigmoid functions are good for elegantly describing some intuitions that might otherwise be clumsily represented with prolific branching. Specifically, intuitions of the “gradually, then suddenly” variety.
By intuition I mean something like: you’re a market maker buying and selling an asset, and if you were controlling things manually you’d bias your trading long (you think it’s going up!), but at the same time if the market is selling to you too eagerly, you might get the feeling that you should back off a bit (what do they know that you don’t?), not completely closing out your position, but buying slightly less enthusiastically and selling a little more aggressively, to flatten your exposure.
Since you’re constantly buying and selling, at any given time you might be short a couple hundred or long a couple hundred, depending on a lot of fuzzy and entropic factors that you generally don’t think that much about (or at least when you do, only in aggregate). You normally just course correct by raising or lowering prices a tiny bit when you want to increase your likelihood of buying or selling to correct back towards your position target, unless you start to “feel” like something is off, in which case you take more aggressive action.
A discrete algorithm might:
A continuous approach produces a better model: use something like arctan(x)
, mapped onto a domain of possible position sizes, and a range of possible price adjustments. Center the domain around a slightly positive number (to introduce your long bias), and you’re good to go.
On either side of the slightly positive bias (x = 100) is arctangent over -π to +π, transformed to fit some range of positions (-400 to +100 on the left, +100 to +500 on the right) and arbitrary price skews from +0.0030% to -0.0030%.
Whether this lovely distillation of the “gradually, then suddenly” intuition is enough to turn a profit is a separate question!
You can even capture some particular “temperament” of response — map from arctangent domains sized π vs 2π or 3π for relatively cool-headed and hot-headed responses.
The implementation might even be smaller, and more general.
;; The basic shape of the sigmoid function
(defn atan'
"Arctangent, but squished onto a field where x, y ∈ [0, 1]."
[atan-domain]
(let [shift (/ atan-domain 2)
y-shift (Math/atan shift)
y-range (* 2 y-shift)]
(fn s-curve [x]
(/ (+ (Math/atan (* (- x 0.5) atan-domain)) y-shift) y-range))))
;; Map any 1x1 curve shape onto a differently shaped field
(defn onto-field [f & {:keys [domain range]}]
(let [[min-x max-x] domain
[min-y max-y] range]
(fn [x]
(let [x-% (/ (- x min-x) (- max-x min-x)) ; % through f's domain
y-% (f x-%)] ; proportionate % through f's range
(+ (* y-% (- max-y min-y)) min-y)))))
;; Functions for the left & right hand sides of the chart, corresponding to the
;; position sizes to compute price skew for.
(onto-field (atan' Math/PI) :domain [-400 +100] :range [+30 0])
(onto-field (atan' Math/PI) :domain [+100 +500] :range [0 -30])
Not only is this model’s chart satisfyingly more squiggly than that of discrete model, it also works much better (in markets with price-sensitive participants, anyway). I find this pretty cool — and it’s hard not to wonder if there are other situations where transcribing the intuition behind an algorithm is actually easier than just switching over several inflection points.
However: caution in domains with low signal to noise ratios. See Ernie Chan, and uhm, Ernest Hemingway, on bankruptcy and nonlinear models.
]]>I got to step 6 of the algorithm described on the Wikipedia page before running out of vocabulary to Google the math that I needed to implement.
It is a modular multiplicative inverse. If it were s = k^-1 mod n
it would be
straightforward enough to Google ‘mod inverse’, but there’s an extra step.
s = k^-1 * x (mod n)
= (k mod^-1 n) * x (mod n)
It’s actually clearer in code I think.
(defn modular-multiplicative-inverse
"Find x in [0, p) such that (m * x) % p = n"
[n m p]
(let [n (biginteger n)
m (biginteger m)
p (biginteger p)]
(-> (.modInverse m p)
(.multiply n)
(.mod p))))
At the time, the advice that I got was:
private final String key = System.getenv("API_KEY");
private final String secret = System.getenv("API_SECRET");
It wasn’t until those API keys started having access to more money that I began to rethink the convenience of environment variables.
Instead of environment variables – which are accessible from other processes (that’s the point, right?) and could feasibly end up in a debug log – I’ve adopted the following workflow:
assoc
in the new key & secret, encrypt it again, and write it to disk.(.readPassword (System/console))
to securely read in the passphrase, and then use it to decrypt the key file
and read it into a Clojure map.Instead of passing the key map around (allowing it to potentially escape
into a debug log, or be printed at the REPL if I do something dumb), the top
level code of my application passes the credentials into a signer-factory
for each api that closes over the credentials.
;; The factory is shaped something like this
(defn request-signer-factory
[{:keys [key secret]]
(fn [request-to-sign]
(sign-request request-to-sign key secret)))
;; Then an API endpoint looks like this
(defn place-order!
[signer {:keys [price qty side market post-only?]}]
(let [request (comment "Format the order data for the exchange")
signed (singer request)]
(do-http-request! signed)))
I like this workflow more than others which are centered around only encrypting credentials inside of your Git repository, and decrypting them when you clone / pull, because it means that not even on my development machine are keys just sitting around in plaintext.
To skip straight to the implementation source, head over to this gist.
The reason that I can call (2) “convenient” with a straight face is that it’s
easy to have an interface that’s similar to how git commit
with no -m
flag
works.
It gets the password, pops up the system’s default text editor (vim for me), and then encrypts & writes the editor contents once you close it:
(defn edit-keys!
"Run from a terminal (maybe via a lein alias) to edit the
encrypted API keys with the system's default text editor."
[]
;; Implementation for each of these in the next section
(-> (read-keys)
(update-password?)
(edit-keys)
(write-keys))
(System/exit 0))
I’m going to use funcool/buddy-core for the cryptography. On top of buddy’s API, we can build key stretching, encryption, and decryption in just a few lines of code:
(require '[buddy.core.codecs :as codecs]
'[buddy.core.nonce :as nonce]
'[buddy.core.crypto :as crypto]
'[buddy.core.kdf :as kdf])
(import '(java.util Base64))
(defn bytes->b64 [^bytes b] (String. (.encode (Base64/getEncoder) b)))
(defn b64->bytes [^String s] (.decode (Base64/getDecoder) (.getBytes s)))
;; Take a weak text passphrase and make it brute force resistant
(defn slow-key-stretch-with-pbkdf2 [weak-text-key n-bytes]
(kdf/get-bytes
(kdf/engine
{:key weak-text-key
;; Keep this constant across runs
:salt (b64->bytes "j3gT0zoPJos=")
:alg :pbkdf2
:digest :sha512
;; Target O(100ms) on commodity hardware
:iterations 1e5})
n-bytes))
(defn encrypt
"Encrypt and return a {:data <b64>, :iv <b64>} that can be
decrypted with the same `password`."
[clear-text password]
(let [initialization-vector (nonce/random-bytes 16)]
{:data (bytes->b64
(crypto/encrypt
(codecs/to-bytes clear-text)
(slow-key-stretch-with-pbkdf2 password 64)
initialization-vector
{:algorithm :aes256-cbc-hmac-sha512}))
:iv (bytes->b64 initialization-vector)}))
(defn decrypt
"Decrypt and return the clear text for some output of `encrypt`
given the same `password` used during encryption."
[{:keys [data iv]} password]
(codecs/bytes->str
(crypto/decrypt
(b64->bytes data)
(slow-key-stretch-with-pbkdf2 password 64)
(b64->bytes iv)
{:algorithm :aes256-cbc-hmac-sha512})))
(-> (encrypt "some clear text" "my password")
(decrypt "my password"))
;=> "some clear text"
Have a look at this gist for a step by step on how you might use this in a project, or just read the clj source here to dive straight in:
How far do we want to take it? Should we clear the API keys from memory?
In my case, this doesn’t make sense — my request signer needs to hold on to the keys and sign requests throughout the lifetime of the application.
If, on the other hand, you were working with something where you just needed
to use the keys once, at startup, you could minimize the time that keys spend
in memory by (1) making sure they’re never String
s, only char[]
s, and (2)
writing over the char[]
s as soon as you’re done with them.
See: explanation of how this works on the JVM.
Security is a spectrum — you can decide how far you want to go. But at the very least, prefer client-side encryption to environment variables — it’s easy, especially in Clojure.
]]>
The old post is below, for posterity.
The (Netty) WebSocket client in ztellman/aleph is the best one I’ve used in Clojure.
As one StackOverflow commenter notes:
It can take some time to get used to the asynchronous style and aleph’s core abstractions
Well, how about wrapping it in something that provides a familiar, callback-based interface?
To be clear, this is often worse than just using aleph
’s API. But, if this is
what’s discouraging use of this library, here it is:
(require '[aleph.http :as http])
(require '[manifold.stream :as s])
(require '[manifold.deferred :as d])
(defn ws-conn
"Open a WebSocket connection with the given handlers.
All handlers take [sock msg] except for :on-connect, which only takes [sock]
-- `sock` being the duplex stream.
- :on-connect Called once when the connection is established.
- :on-msg Called with each message.
- :on-close Called once upon socket close with a map {:stat _, :desc _}.
The optional :aleph parameter is configuration to pass through to Aleph's
WebSocket client."
[uri & {:keys [on-connect on-msg on-close aleph]}]
(let [sock (http/websocket-client uri aleph)
handle-messages (fn [sock]
(d/chain
(s/consume (fn [msg] (on-msg sock msg)) sock)
(fn [sock-closed] sock)))
handle-shutdown (fn [sock]
(let [state (:sink (s/description sock))]
(on-close
sock {:stat (:websocket-close-code state)
:desc (:websocket-close-msg state)})))]
(d/chain sock #(doto % on-connect) handle-messages handle-shutdown)
@sock))
(defn ws-send [sock msg] (s/put! sock msg))
(defn ws-close [sock] (s/close! sock))
Here’s how you’d use it to print out tick data from Bitstamp:
(def sub-msg
(str "{\"event\":\"bts:subscribe\",\"data\":"
"{\"channel\":\"live_trades_btcusd\"}}"))
(def sock
(ws-conn "wss://ws.bitstamp.net"
:on-connect (fn [sock]
(println "Connected.")
(ws-send sock sub-msg))
:on-msg (fn [sock msg] (println ">" msg))
:on-close (fn [sock msg] (println "Closed:" msg))))
;; After you've seen enough...
(ws-close sock)
One word of caution — if you’re connecting to a particularly noisy socket that sends large messages (e.g. BitMEX), you might have to tune the frame & frame payload size.
(http/websocket-client
"wss://www.bitmex.com/realtime"
{:max-frame-payload 1e7 :max-frame-size 1e7})
aleph
also lets you write a server to test your client (or use for whatever
else) without too much trouble.
(defn uppercase-handler
"Handle a message by upper casing it and echoing it back."
[socket msg]
(s/put! socket (string/upper-case msg)))
(def server
(http/start-server
(fn [req]
(d/chain
(http/websocket-connection req)
(fn [socket]
(s/consume (fn [msg] (uppercase-handler socket msg)) socket))))
{:port 9999}))
(def client
(ws-conn "ws://127.0.0.1:9999"
:on-connect (fn [sock] (println "Connected."))
:on-msg (fn [sock msg] (println ">" msg))
:on-close (fn [sock desc] (println "Closed:" desc))))
(ws-send client "A message")
; > A MESSAGE
(ws-send client "Another message")
; > ANOTHER MESSAGE
(.close server)
; > Closed: {:stat nil, :desc nil}
In my experience, clients based on Netty end up being more durable and reliable
than Jetty based clients. Additionally, aleph
has plenty of contributors, a
good history of fixing issues and upgrading the library. It also handles tons
and tons of other networking functionality under a consistent interface, so if
you’re going to add a dependency to your application and learn the paradigms
that it suggests, it might as well be one that you can continue to use as your
application matures.