Related Site Routes
Internal links that connect this video to topical pages, site search, and the broader research graph.
Related Works
Bibliography entries inferred from the video title, topic tags, and cached transcript excerpt when present.
- Postdoc review (2020–2023) - 2023 - Media & Teaching
- Enhancing Population-based Search with Active Inference - 2024 - Active Inference
- The MutAnts are here - 2017 - Entomology
- Two lineages that need each other - 2017 - Entomology
Transcript
Cached caption text when YouTube exposes captions and the transcript fetcher has been run.
Hello, welcome. It is June 12th, 2026. We're in talkra number six discussing recursive self recursive self-improvement is a portfolio optimization problem here with York of the author's list and we'll just jump right in. So York, thank you for joining and looking forward to this presentation and discussion. >> Yeah, thanks man. Thanks for having me. Um, so I think sort of the right place to start is I think to kind of first of all it's like what is portfolio optimization? Then also, you know, what is recursive self-improvement? Those two things and then how do you sort of recast recursive self-improvement as a portfolio optimization problem is kind of the idea of the paper that we gave. And then of course we walk through and give a little bit of evidence for it. But the first thing
when we're talking about portfolio optimization um and kind of start off there, portfolio optimization is really the same thing actually as reinforcement learning is the same thing as model predictive control, stochastic optimal control and under the right units of analysis even something like variational um imprints and even um things invented back with lrangee calculus of variations they're all optimization problems of a sort um usually pred prediction optimization problems. But portfolio optimization was first kind of invented uh in like the 50s by Marowitz. And what they did was they said we're trying to rebalance a portfolio each day and we're going to make a prediction of what the portfolio is going to be tomorrow. And we're going to do that by taking the rolling average. So they take like a you know the mean of the
returns on a day-to-day basis and they have a vector for that. Then they have a coarance matrix relationship between all of those returns. So Apple goes up by this amount, you know, Tesla will go down by this amount or whatever. And then they use that as a model of what they expected the economy or what they expected the prices to do. And they'd make a projection for one step in the future. And then they would do this, you know, linear program, this optimization, this convex optimization solver to pick the, the, you know, weights of the portfolio that would get them the best riskadjusted return. and uh or that you know they could do the best just absolute return or you can put a bunch of different risk functionals on it. And that was really the
start of portfolio optimization. But it's if you don't really appreciate where it comes from um it's it's not immediately obvious how they came up with these formulas. But if you go back even farther you see that there's really like there's this probability theory line of thought that comes out of people like Cole Mccorov back in the 20s. And um you know from that you sort of get on the opposite side you have these these calculus people stochastic calculus um you know people who were trying to model particles in physics for example in the ' 50s they were they were trying to uh you know like Monte Carlos for example a lot of this stuff and what what the kind of intuition was was after they had come up with a formula for this brownie in
motion of particles they were trying to you know figure out how could we design systems that allow us to control these particles in order to achieve an objective. And so whenever you're describing these particles, there's a drift coefficient and then there's this like volatility coefficient. There's this variance movement around, right? Um and it turns out that that's the the direct analog of the drift coefficient where the particles are just kind of moving in a direction. That's the returns of an asset price. And the variance is uh you know the the sort of noise term for the Brownian part of it. That's the volatility of the asset. And so it was a direct they basically took exactly what what we were seeing in this you know Hamilton Jacobe Bellman equations is what they called them but
these stochcastic particles and that and the physics side of things and they took it and they put it into portfolio optimization problems. Um and so that's really kind of like the beginning of portfolio optimization. And since then um alongside one another portfolio optimization has gone from I want to optimize my portfolio over just one step in the future. Instead, you might say, well, I want to optimize my portfolio over the next trade and the next trade and the next trade or maybe 50 trades or 100 trades or maybe arbitrarily far in the future. So, you have multi-step optim portfolio optimization. And at the same time, you have the exact same stuff happening in, you know, uh there was a Shell linear matrix solver that came out in 1961 or something that Shell Oil used to
optimize the valve controllers on their uh petroleum uh refineries. And you have all these different, you know, you have the the SpaceX rocket as another example where it's landing. Uh, in fact, if you if you look at the Apollo lunar missions, they were using model predictive control. So, they had a, you know, a rocket equation that tells them how the thrusters work. Um, and they would use that model. They would make predictions and then they would do optimizations over them. So, there's just this long history of people doing essentially having a model that predicts the future and then doing optimizations over those predictions in order to achieve an objective. And over the years, the technology has gotten more and more sophisticated. So portfolio optimization at this point is reinforcement learning. Like there all of the
papers that you'll read, all of the state-of-the-art stuff is, you know, instead of your model being mean and variance of prices, it's you know, deep neural networks taking in all of the price information and making these very, you know, sometimes been probabilistic predictions of prices. and you have, you know, multi-step predictions with complicated objective functions, maybe multi-objective optimization. Um, and so you're starting to get into the the period where the the differences between something like Alph Go um which is sort of the big daddy, right, of um or or you know, Mu0, which is an even better version that came out later, Alpha Star. these these really cool model free reinforcement learning algorithms directly are used in the finance industry to try and model uh and allocate resources over a portfolio and so that's really
like the portfolio optimization problem is pretty much synonymous with reinforcement learning these days and re it's synonymous with um model predictive control which is what people like Yan Makun or Fe Lee would use to describe where a world model fits into this right so that's kind of the background of that is when we say portfolio optimization it's kind of uh the finance term for what everyone else is sort of doing. There's that's the fact that those two are the same is really the interesting thing there. um the recursive self-improvement stuff you know that goes back to you know you have Schmidt Hubers's goal machines and these kinds of things but even like J good talking about intelligence explosions right the the idea that an algorithm can make a newer version of itself and makes a
newer version of itself and you get these very quickly these sort of exponential curves of self-improvement and we see this kind of behavior all over the place as well like bacterial colonies you'll see exponential growth there as well where uh they'll they'll grow and grow and grow they tend to curve obviously as resources run out which is actually kind of the reason why I think uh it's one of the interesting things that is often not considered in in recursive self-improvement literature but all over the place you have these these intelligence explosions these exponential curves um in nature and in in a variety of different scientific fields and recursive self-improvement is usually considered as like a software problem right that you're you're saying a software rewrites itself that gets better that rewrites itself that gets better.
But I don't think that um that like way of thinking about it endogenizes all of the relevant variables for the process of recursive self-improvement. One example is the enormous amount of capex required to buy the chips upon which that improvement is occurring. That's obviously part of the problem, right? You need to indogenize that into the recursive self-improvement process. um acquiring the data uh you know if you have scale AI hiring a bunch of people to go and you know you know McKenzie consultants writing out deep research reports as you know to help you um you know train your neural network that's also something you'd have to indulge night so the the curation of the data and then also of course the people using your product um you know the the data that you collect from
users if you don't you know that that's another place where you start to touch the outside world and so the the recursive self-improvement process is really contains contains um when you really think about it the appropriate level I think of analysis is more the company that is producing it. So like anthropic as a company and its business relationships is really a better unit of analysis for the recursive self-improvement process because it it includes many of the constraints and the costs um you know the liabilities and the assets that are required to understand how that process will actually unfold most likely in the real world if it if it turns out to be a language model company. [snorts] Well, that's kind of the first point here is that we say like every flop, you know, every
bit of data costs money. And the domain where that kind of tends to matter is basically that self-improvement is an economic problem. You you have to understand the relationships and then of course on top of that there's economy enters in another interesting way where if these companies get very strong, you you know you have the people like the Eleazar Yudkowskis, the if everyone if anyone builds it, everyone dies kind of thing. That's another way where the game theory of being an extremely powerful AI also enters the picture. Um so there's a lot of um there's a lot of ways in which I think that it's best to understand the system that is producing the self-improvement at a more broad level at really the corporate level and the corporate level plus legal plus these other things.
So the first part of the the thing is we kind of in our paper we kind of walk through the mathematics the sort of scaffolding of how you think about this. you know, a company has shareholder equity. And there was a a Supreme Court case, Dodge Ford, where the Dodge brothers sued uh Henry Ford for being too kind to his workers, essentially paying them too much money. And that was when um it was like legally mandated essentially by the Supreme Court that a corporation has to maximize shareholder value, has to maximize shareholder equity, which means assets minus liabilities, right? And then, you know, you can start to say, well, if we're going to cast a corporation as like a a reinforcement learning problem or or a portfolio optimization problem, there needs to be some sort
of reward. So, that would be, you know, the change in the company's equity from one period to the next period to the next period to the next period. Um, and this roughly corresponds to, you know, getting a little bit better returns on a move and go or something. And then you take the discounted sum of those of that future changes in equity. you take some sort of measure over it. Usually it's the average or something, but you have this set of possible futures that could occur based on the decisions you're making. And um you're going to be that's essentially what your objective function is, right? Is getting the is having the maximum discounted sum of future uh changes in corporate equity over time. And so already we have some of the machinery here. Uh and
you can also think of a corporation as a bundle of assets. So uh and this is where really like the true to form you know reinforcement learning or or model predictive control starts to enter the picture right you have your sensors so the the sensors would be broadly speaking like the data right so if you're the Apollo mission that's the calman filter that's taken in the data that's the you know the screen pixels if you're if you're training on an Atari video game or the or the DNA if you're training a protein folding model and so the same thing is true of arbitrary corporations for reasons we'll describe a little bit later. The f the corporation we're going to focus on in this paper is a quant fund um because it turns out that I
think it's the most well specifiable. Right? So the sensors in the case of a quant fund would be all of the data that it's taking in in order to make the decisions downstream from that. So that would be asset prices, geospatial data, capacity utilization for a factory, you know, anything that you can think of that might be useful for predicting or for optimizing that shareholder value. That's what the corporations can use for sensors. The actuator side of the equation would be like in a humanoid robot. That's the actual interface with the world with a with a corporation. That might be your, you know, employees or it would be your API calls to a brokerage or uh you know whatever it is. Um and so those as well, those would be you know which brokerage accounts
do you have in the case of a you know modern quant that would be like how much amum do you have? What kind of margin contracts do you have? what kind of uh you know broker dealer relationships do you have? Do you have uh you know friends in the in the SEC? Whatever things it might be. And then there's um the map sort of between the two which broadly speaking we kind of consider as this parameters thing. So this would be if you have a you know a model that's taking in that data and then mapping that data to predictions of what it thinks that data is going to look look like in the future. um you probably have a neural network doing that these days. That's your parameters, right? That's the that would be
like your API compute. That's your literal neural network architecture, the you know the trained weights of it. Um and then in the case of uh quant you have this vector of deployed assets that are already on the market. So this would be like your current portfolio holdings which we'll term I uh here. So this is this is like your um this would be like I have this much Apple, I have this much Google, I have this much you know commodity, I have this much gold, I have this much oil. You have an outstanding portfolio of assets that are sitting there. And the the reason we specify this variable separately from the rest of your data is that it turns out to just be nicer for equation sake. Uh these these variables here are not like
mutually exclusive. It's just a nice way to kind of slice things. And the last thing is this re R&D uh symbol here. And this one is going to be very important. R&D is essentially how the firm processes information about itself. Um, so this would be all of its previous trades, all of the previous model parameters, all of the experiments that it's run, all of its uh, you know, previous data that it's received. Essentially a history of all of that information. Um, and it uses that information. It's that would be any of the expenditures or any of the um, operational capabilities to improve the other things here. So this would be like your sensors. If you want to get better sensors, you want to acquire more data, roughly speaking, we'll just call that R&D. Or if
you want to get a bigger neural network or buy a bigger supercomput, that would be R&D, right? And so with these five things here, what we do is we say, okay, cool. This is a state vector. So there's a there's a stock of these variables in any given time that you could measure in dollars. Um, and then there's an action vector, you know. So an investment action would be rebalancing your portfolio. Your sensor action would be acquiring new data. your your actuator of action would be spending cash um to uh you know acquire a new brokerage contract. R&D would be hiring researchers or you know h you know running an auto research experiment or whatever it is and then your parameters would be you know running a training run to bake the model bigger or
something like that. Now the the world in which this uh corporation is living is broadly speaking they'll call it the environment in a lot of the literature. um the corporation is not separate from it, right? This is kind of like the marov blanket um in the they call that in the sort of uh you know active inference community. I believe this is basically like where are the boundaries of the corporation the cozian boundary so to speak um and if the firm is small if you're very tiny and you manage like 100k or something you're not going to affect the market in which you're trading you're not going to affect the GPU market when you're trying to buy GPUs. So for the sake of this paper we assume um that that is true but it actually
doesn't change the underlying equations. You just need to account for it um with some cool math from uh markup blanket adjacent stuff. But the for the sake of this we're going to say essentially that the state of the corporation and the state of the exogenous things that are outside of it. So that would be like government things, wars, whatever. Assume that those are separate um just because it makes the equations look nice. So we call that the the small firm approximation. But basically, we're actually going to go back to the self- forecasting loop. What the central object that resides inside the neural network is this economic world model. And what this is is it's a joint transition function approximation of in this state given a certain action, what is the next state? What is the
next state? What is the next state? So, it's this it's this ability for you to predict as a function of you doing a trade or you doing a R&D investment. what is the state of the corporation going to look like next and what is the state of the environment outside of the corporation going to look like next. So in practice this could be two separate neural networks. This could be um it could not even be neural networks. You could have uh you know variety of different ways to model it. But this joint economic world model is sort of what I think the the central object that um a recursively self-improving quant fund will have. It is a it's an object that allows you to run literally Monte Carlo rollouts eventually but make predictions about what
the future state of the corporation and the environment is going to look like and as the firm makes these decisions um you know as it as it operates in the world it's going to start to collect histories um channel specific histories we had the uh you know the sensors actuators you know parameters and things like that going to start to collect histories of the decisions that it made in the past and what was the result of that uh and it's going to collect rollouts, right? In a sort of reinforcement learning term. As you go through this process, this history gets bigger and bigger and bigger. You can learn more and more from it. You can acquire data about the environment. Usually, you can just buy, you know, data from an API or something. Um, but
over time, the firm is going to collect more and more history. Um and what that eventually will allow you to do is to start to forecast the effects that your decisions um will have on the corporation. [clears throat] Um it'll start to allow you to do this sort of self-foring loop where you you can predict what the marginal experiment or the marginal improvement in data will do to your reward. what what will the marginal return for the the most people think like oh just the trading stuff obviously if I rebalance my portfolio I'll be able to you know predict what the returns on that rebalance are going to be but if you run enough experiments you'll start be able to predict well as I add data you know if I go and acquire more data
how much what is the marginal return on equity that I get from that what is the marginal return on equity for adding a new brokerage or even you know potentially what is the marginal return on equity for some sort of research experiment and so that's what the rest of the paper is about is how do we construct this self- forecasting loop? How do you how do you construct something that can predict itself? We kind of skip down here a little bit. Yeah, this channel specific world model thing. We'll also skip past this for now. now. So, go to here. Yeah. So, if we treat this sensors as an asset, what we found is that there are very predictable scaling laws um for data. So as we add more data to the model um there are
very predictable scaling laws. In other words, the the channel specific history of data um as we've collected more and more data as we've run more experiments we found that there are very predictable scaling laws for data. In other words, if we add more data, we can tell you how much better the model's loss will get. And from that we can tell you how much better the returns of the algorithm that is trading will be, which tells us the marginal return on equity. So we have a direct line that we can use we can calculate derivatives on from the equity of the corporation all the way back to a marginal change in data. So this here we already are starting to see like cool for each gigab gigabyte of data what how much is that worth
to the company and we have the you know the equations for it and so you know if you can you can we can see here we have a few more orders of magnitude worth of data that we could scale and so we have the the parameters fit to this similar thing on this this actuator side. Um, we can start to see as we add the ability to trade more assets, what does the marginal return? And for some reason, the y- axis aren't rendering here, but it's the annualized return in this case. As we add the ability to trade more assets, holding the number of assets it's trained on fixed. And you see that there's also predictable scaling laws on the model's ability to make money as you give it more assets to trade on. Uh
and so that's another thing where if we can if you can establish these scaling laws, they're the most straightforward way to be able to predict very easily with you know very simple equations what is the sort of return on an investment in adding more assets. You can you can go figure out how much it costs and then once again you can do the chain roll and get all the way back to the change in equity of the corporation as a function of changes in uh actuators. And this one's for sharp uh because a lot of uh finance people like it. We have the numbers we fit here. Now, the R&D one is the most interesting one cuz this is the one that a lot of people are kind of like this is where the uh
oh or the fumes or the this kind of stuff comes from. It's all about if you're trying to measure the marginal because what we're trying to do here is measure the marginal return per research and development experiment. Um and so there's this recent trend of auto research stuff where you can start to say you know if I have Claude uh what is the new one? Fable 5 uh run an experiment and then [snorts] I collect a bunch of data sets of that how much better does the prediction get um as a function of those auto research experiments and you can start to collect data sets on that and so we've started doing that as well where we can see you know we can start to measure or understand the process this this stochastic process which
this follows the same type of the math you use to model this it turns out is similar to like world record modeling you're trying to predict like how much is the 100 meter dash can get better over time. You kind of want to do the same thing here and like how much is my best model. So you start to model this as well and then you can come up with once you once you do this and your standard error gets small enough you can start to say things like if I run an auto research experiment right now how much return on equity do I get and so now we have not only the sensors model so we can tell how much money we get or how much better the company is from data we can
also do it from actuators we can also do it from research and development doing it from investments is rather trivial we kind of skipped over it but that's kind of the whole bulk of all of quantitative finance is if I rebalance my portfolio what's my return on equity And and so the last one is sort of the parameters and that's the one that we um don't have any information on this paper yet because uh the experiments are currently being run as we speak. But that that essentially is um if we scale up the size of a model for example um you can get there's these Kaplan chinchilla scaling laws that will tell you the optimal ratio of data to parameters. If we can get start to get an idea of how much bigger does the
model get, we can a figure out how much that costs based on AWS compute prices, we can say all right cool if you make the model this much bigger or if we change the architecture to this type of architecture um you know how much better does the model get, how much does the loss go down and if we can get a scaling law on that which evidence suggests that we can then we'll have a be able to say what is the marginal return on making the model have more parameters and then we can go from loss to trading to change in corporate equity and then we'll have essentially a rudimentary solution to the problem of um turning the entire self-improvement process into a uh self-improvement or into a reinforcement learning or portfolio optimization problem where
we can look at the marginal return on research and the marginal return on all these things and say what is our best decision right now that we could make to increase equity next period or over a discounted sum of the next periods. And so that's really what the um yeah the parameters as an asset one here where we we say uh we have some model sweep is forthcoming right there is some interesting stuff about continual learning we could talk about as well but that's kind of the idea is that in quantitative finance because everything is so complete um with language models and with a good team. I think that these are the first companies that are going to be able to recursively self-improve. Um, and there's a bunch of other things in the paper and
some other appendices, but that's that's kind of the general thesis of of what our paper is about. And we kind of show the first preliminary evidence of what this would look like in practice. >> Thank you. >> Excellent. >> Super interesting. >> Then I'll open up with yeah few of my thoughts and maybe a question. I just want to say to start off the fact that you drew the initial parallel from Browning motion of particles and the yeah the browning motion as volatility and the drift as the return curve texture just goes to illustrate how how you know as above so below alpha to omega these natural and man-made systems always go back to the same deep patterns that are found at every scale. And that just for me speaks to I mean truth is
not the right word. It speaks to the reality of them and that these patterns actually are emergent and can be seen and witnessed. And your work here is uh helping to elucidate that in economics which I think is super cool. And just when when you see it in nature, when you see it in an economy, when you see it in the Bitcoin price, when you see it in the behavior of ant colonies or in the movement of particles, it just reminds us how beautiful and interconnected the world is. So super pleasant to hear that. And then one of my questions was going to be maybe speaking about the nuts and bolts of actual like the trading architecture that an agent like this would have >> are the tools that are that are afforded to an
agent let's I'm using the term agent to speak to this recursive self-improving model or corporation >> are is that also a parameter because you can imagine you can give it different derivatives capabilities. Can it buy insurance? can participate in all kinds of markets whether that's prediction markets traditional stock exchanges buying commodities taking equity in private corporations so how how does that come into the equation yeah so there's sort of they're sort of two I'd say there are different there's sort of like um you know like there differences in in degree and differences in kind there sort of two separate types of changes I would say so one of them is like let's say that you're already trying to approximate approximate um like equities and let's say they're only looking there's there's approximately 30,000 equities and
ETFs that have traded on the US stock market since 2004. So let's say that you're only looking at a 100 of them and then you say well I looked at this scaling wall I can see it you know it's working. So this one here the one we were looking at earlier um this one. So this here is exactly this kind of thing right? So this one goes from one uh asset. So you can pick a set of one assets and you get some confidence control on there. Um and then you continue to increase the number of assets. That's one way to kind of look at it. You'll notice though here that the the x-axis is dollar weighted tokens. So this is actually not a concept that we kind of came up with although we kind
of came up with it independently. Um there's this mun paper called from 2025 from Munichof I believe is the author, but it's a scaling law paper. And the question is what are the right what is the right x-axis that you want to include um whenever you're trying to come up with these scaling laws. And so we did a bunch of research into like what what is a scaling law like how do you measure it and you want to do these they call them quality adjusted tokens. Um so the the idea would be um you know in any kind of system they're sort of conserved quantities right so like in the case of uh like you know energy is conserved in physical systems. So you kind of think like what is the sort of conserved quantity
of finance and excluding inflation which is also to some degree conserved because it's printing money usually although there is other things um there's sort of conserved dollars is what we found and so we found that if we take if we take the information um if we take number of assets traded the R squ on this line is nowhere near as strong. In other words, it's not as good of if you don't normalize by whatever the conserved quantity is, you tend to get these much thicker confidence intervals, the the trends are not as clear. But if instead you say, well, what we really care about is what what is the flow of capital moving from one place to another. We're trying to model the flow of capital across the entire economy. Like we have these um
these lines here. Um you'll see around 10 10 the 16 10 the 17. Um this this first line uh represents all asset prices that you could have access to. So this would be I guess I can't zoom in very well with the change, but this this line here corresponds to if we had all bonds, all stocks, all commodities, all real estate, all interest rate swaps, um private equity, you know, like uh all of the derivatives. Um and to the best of our ability, we try to come up with these like understanding of approximately how many dollars are in, you know, tradable markets that we could realistically get access to um in dollar terms. Um and so as you increase the amount of dollars under observation in that sense you get a better you get increasingly
better predictions um is is sort of the first thing. This second line here is not it's it's non-market it's not markets in the sense that I'm buying a security or something or a commodity. This would include like credit card transactions and like gas being pumped at your gas station and input output tables from all of the different industries to each other of buying and selling products and um basically that that's what these two lines are here. So I think that the the the dollar weighting of one thing uh or is one thing that is very important um in understanding how these scaling laws work when it comes to like the actual the first kind of part of your question which was how does the actual trading infrastructure itself work for an agent and like what
are the affordances given the the right way to think about it is sort of as if I give an additional affordance to we'll say the agent um it is giving it exposure to a a much larger set of dollar weighted tokens that it can act upon right so you could theoretically imagine at first it's just going to be this highly APIe you know uh I can just send an order to the New York Stock Exchange super easily but as these humanoid robots come online and as you know container ships and we add more automation and more you know control systems into the real world that our API complete you can imagine that um an entity like this could also start to command uh you know factories and there's those whole like lights out factories in
China so you can imagine systems like this also controlling factories or also controlling um individual humanoid robots or drones or any any you know more some more interesting things like rockets or whatever else. Um and so that's kind of the way I think is that the the that would be more like the change of kind inside of a given domain where you're saying I'm just increasing the number of assets um that I'm that I'm able to trade within a given domain. So just increasing from 100 equities to a thousand. That's very predictable. there's these disjoint breaks to some degree. And that's one of the things we're going to prove out here is does this this scaling law continue purely as a function of dollars. Is this truly a conserved quantity or is this just an
equitycentric, you know, market ccentric scaling law? Does this also continue if you add credit card data or geospatial or whatever else? Um but yeah, I would say that um that that's kind of my thoughts on it probably. >> Yeah. Al also follow questions Dan please. Yeah, just on these scaling laws. Really interesting how the the log axes and how that connects to scaling laws like what is the natural size limits or Pareto optimal morphologies for different firms. Like for example, trading only one asset you stand to gain >> by adding a second asset. But the marginal on the next asset starts to decrease. decrease. >> So then there becomes some sort. >> Yeah. Yeah. So and analogously for sensor channels and for actuators they all have this this logarithmic >> this kind. >> Yeah. So
it's just it's like that's like like the diffusion level of oxygen and then that relates to what kinds of morphologies within a given circulatory system can exist and and how their metabolic rates relate. So just like we wouldn't expect to see all of the assets necessarily traded by one entity [clears throat] because it might get outmaneuvered by several small or medium scale. And so like just the the ways that that and and how that uh triple intersection which is where it's all driving towards with the financial the epistemic value of the value of information reduction of uncertainty about somethingformational and then grounding it into the jewel and the electricity and however that loop gets closed by Bitcoin miners or by data center and that kind of triad. had understanding it is the total operating environment
of any cyberphysical system nowadays. nowadays. So having these these strong analytical and empirical relationships to help you convert amongst them, you're never going to do worse actually by knowing better about that triad. And worst case scenario, you're going to be throwing out too many or too few auto research dispatches and just vibing whether you rent one GPU or 5,000 and not knowing how that connects to the marginal relevance for the actual group. >> Yeah. Like there's a few things are kind of interesting here. So the one of the one of the things that's interesting is so the y-axis here for some reason it's the median absolute deviation. So median absolute error on a per um per asset uh basis. So this is like per data stream so to speak. What this is really is that
the predictions per um asset go down as you add other assets. So that's kind of that's one thing that's kind of interesting. It it well characterizes the fact that the economy is a graph, right? there is a bunch of relationships between different entities and as I give you marginal as you cover sort of larger and larger u you know set subset of that underlying graph you get simultaneously better at predicting everything on it which is one of the things that's kind of interesting there is this sort of intrinsic noise floor though now one of the things that's interesting is you'll notice these numbers range from8 down to 0 you know 5 or whatever if you look at language model scaling laws um and you say okay well what do those look like those the loss
function that they're measuring is usually binary cross entropy which means that they're saying you know what is the what is the sort of KL divergence or what is the difference between this vector and this vector and so those numbers tend to be much larger um uh just be just numerically that's just what you end up seeing whereas this is a this is a regression loss they're doing a classification loss so the classification losses actually look different um one thing that we found that's kind of interesting is if instead you do if you turn this problem of trying to regress you're trying to predict you know this is what I think the price should be. Here's what it actually was. I'm going to take the difference between them and square it and that's my loss. If
instead you turn it into a what they call a pinball loss, which is you're trying to classify what the next return is going to be into a bin along the distribution of possible futures. So you have sort of a bell curve there and you're saying which part of the bell curve is going to be located on the loss. Um it turns out that's actually a better way to do this that um another term for this is distributional reinforcement learning. Um, and so a lot of the the field is moving that direction just because it turns out one, it has the it has some nice properties and there's some there's some like information geometry ways of reasons why it's better for training these models, but it also changes the the way that these scaling laws look
cuz this one here we have actually this is a pretty uh remarkably like accurate scaling law. So we have a noise uh floor that we've actually estimated here in terms of like what is the minimum possible uh amount of information um that we could possibly get asmmptoically and the we we have like an estimate here I believe it's about.3 yeah the noise floor so it's about 42 and this is estimated using um hierarchical basian um posterior estimation thing using numpyro um but the the truth is that this this noise floor um is a function of the architecture itself um in practice and it also is a function of the loss function. So I think that the particular curvature of this chart um if you look at language models they don't even do this. They don't
model a noise floor because the the numbers tend to be so much larger that you you can continue to go you can make projections out several orders of magnitude and you won't have predictions that are better than or that are like lower than zero essentially in this case because of the type of loss function we're using. We're seeing curvature here that um with slight changes in the loss function it'll actually this equation would or this line would look a little bit different and you wouldn't need to put that uh noise floor on it. So there are some like you know pretty subtle things you can do to kind of affect this. Um it is true though that when you um kind of zoom out and you look at like what it is that we're doing
here, there is a lot of similarities between this and like everything else like you're all of these like these scaling laws the the fact that like a circulatory system is fractal in nature. Um the same thing is true of the economy there is highly fractal. You can observe trends at a variety of different scales. And so that's one of the there there's just you tend to observe these power law type loss curves anytime you're trying to learn something on a a fractal uh geometry. You see this this power law fractal stuff is is all over the place. So anytime you're trying to learn a fractal system which the economy is, you do get these kind of like uh you know power law uh decay and loss is very interesting. These ones however don't appear to
have that. Although this is uh you know these these tend to go up more linearly which is kind of interesting. I'm not exactly sure why that is, but it is kind of interesting. We we fit a bunch of curves to it and there's no difference between linear and just having [snorts] be a polinomial or whatever. It's just a straight line it seems in terms of the annualized returns. >> That would suggest if we simply make more assets, we can simply increase the returns. returns. >> It would seem so. Yes, it seems that um as as so long as we don't run out of ideas, there is more money to be made. There is um one thing that's pretty cool here. Continual learning regime. Um so this is the line we were just looking at is
this uh blue one here. Um although this is taking the absolute this is taking the loss summed across all of the predictions, right? [snorts] Um and this orange line here is the amount of loss that we see. Um the the because this blue one is till we train till convergence. So whenever the model is done training and we're like cool this is as good as it's going to get. Um and so that's kind of interesting. This orange one is the amount of loss after one epoch of training. So this is it's seen the data once. How much how what does it learn? Right? What we see is that these lines get closer and closer together as we increase size. And so we can project forward and say when do we expect that in looking at
the data once we will we will already achieve a loss that is optimal. Basically, we will only need to do one epoch. And then what happens when we need less than that? Like what happens when we need half of the entire data set to to perfectly learn it. What if we had a you know what what's way out here on the 1E17 territory? When we get up to that data, our expectation is is that we could train a model and with very very few samples, it can completely understand the dynamics of the economy, which means that we can just update it relatively infrequently in practice and it will be highly resistant to like regime change because it can just learn the new regime immediately as it goes on. So I think that this is kind
of what continual learning shows is that as you increase um model size and model weights and I don't know how much this is kind of a novelish research there's not as much going into this um showing that as you increase the data size the different basically you're you're you need less and less epochs until eventually you converge and you only need one epoch and then you need sub only need a small portion of your data. So this is this is um one of our objectives as well is to get to the domain where we don't need to update the model or we can update the model every day and it'll always be you know that's that's all that it needs in order to perfectly learn the new dynamics. dynamics. Yeah, that was actually going to
be my next question is how do you think about the time scaling or to put it in a beesian sense? How often is the agent sampling its environment? You know, because cominatorily it could be infinite if it has to survey every bit of data at different amounts of time, submillisecond sampling, millisecond sampling, day by sampling, quarterly sampling. Um, so how does the agent go about solving that problem? And to your point you just made like the optimal is that you do it as infrequently as possible and you're able to interpolate a really accurate picture. But how will that be resolved? >> Yeah. So I think um there there actually two um so in in practice the kind of goal of the company uh is we kind of talk about it is like towards we said
towards a differentiable corporation is kind of the objective right and so part of this is how you know what kinds of decisions do you make um part of the decision you want to make is I want to have a good posterior over all of the parameters of my scaling laws let's Um, so I have, you know, I could run a new experiment that will tighten my confidence interval on um, one of the parameters that I'm using to optimize my my company. And at the same time, I could it's it's basically the difference between explore and exploit. And I think that the explore side of things will have a different frequency of experiment, a different frequency of decision-m of updating those posteriors. You'll have a different frequency than the the exploit side of things, which is
just like I'm going to trade. I'm going to run a big on bigger scaling law. I'm going to do the the sort of adjacent thing that I know will work very well with high confidence. That kind of stuff is that stuff is relatively easy to exploit. It's on the explorer side of things where you're like, hm, my standard error on this on this parameter is relatively large. In other words, I don't have that would roughly correspond to a large um you know confidence interval on a given parameter. I could run more experiments to tighten that confidence interval, but those cost money. And so there is actually a relative value for tightening that confidence interval for for getting a better understanding of what a given parameter is. Um and it might in fact that reduction in
uncertainty itself might be more valuable occasionally than it is to exploit the things you've already done. And so I think that probably what it'll look like is on a day-to-day basis we'll be updating the neural networks you know consistently in order to do exploitation stuff. But occasionally we'll reach a point where especially if a lot of uncertainty happens where we'll say hm we're not really sure if this scaling or you know this this specific parameter like if our scaling law parameter we're starting to get a wide confidence interval here it's not entirely clear that making it bigger is better anymore cool we might start running more experiments in order to narrow that confidence interval in order to say all right well what is there or we might run more experiments on the research side so
I think that it's going to be very much like a low-level you know optimization and and updates of those pri of those uh you know getting a better posterior pretty consistently at the exploit layer and on the explore layer. Um we're going to be you know doing that less frequently as a function of the uncertainty. We kind of talk about that in the RSI thing here where this is over a 90-day horizon. what is the improvement that we get um in the in the algorithm as a function of research and uh development and improving the algorithm and then we have this decay which is how much worse does the algorithm get as a function of the market learning our strategy or whatever else and as long as there's a distance between these then you see
a um you're going to get uh your derivative essentially is getting steeper and then once these two are equal that means that the derivative of your self-improvement is static you're no longer able to improve improve um faster than you're decaying. And then if you and so this is this is essentially an an analog or it could be understood as the S-curve of a of any corporation or the S-curve of any system. As long as this blue one is bigger than this red one, we're in the getting faster exponential phase. Um if it ever gets, you know, so like as the fund gets bigger, it's going to get harder and harder to make more money, right? So that would be this red thing getting closer. if our technology gets much better and it turns out that
you know mythos 6 or mythos 7 or if we make our own language model maybe that one maybe this will get a little bit farther away or maybe we'll find a new algorithm that is like significantly better at scaling. So that's kind of a there's a balance between um these two things here and you can also you know exploiting it is basically keeps this distribution the same um but the decay is going to start creeping up on us and the exploration is going to move this out again. So it's like there's going to be a competition between well if we have a huge distance here we should just exploit right now if there they start to get up on each other we want to move that distribution ahead and so that's that's going to be
a process of that. Well as we head into this last little bit York what are your groups next moves? moves? Um, how can our extremely small though extremely high quality talk audience learn or do more? >> Yeah. So I think I'll and to talk about what we're doing next the corporation at this point we as we can see here in this thing this this chart we have a large distance between the amount of capital or basically the amount of improvement we can get versus the the decay the there what we call this is like the TRSI so to speak this is like the distance between decay and alpha right now because it's large we're in the exploit stage and what that means is that for each marginal unit of capital that we get we can
deploy it and get an enormous amount return far and above what it actually costs us to get that return. And so for now, what we're doing is we're raising um significant amounts of capital from uh probably venture and all if we have to, but more interested in um individuals who are actually interested in funding work like this. Um not because they want to dump a product on retail at an IPO at some point, but because we need to make sure that um there are reasonable heads that are actually considering if we're going to be building AI, we need to actually care about how much it costs, right? like we need to we need to actually pay attention to these things like what are the economic impacts? Um there's um some really cool um work that
we're about to release actually that's on um that is an an economic graph. It's our it's our current best graph of the economy. So it's like $50 trillion worth of dollars moving around um at a quarterly um frequency all of the inputs and outputs from every industry. And then on top of that for each industry what is what are the uh different jobs that are employed in this industry and then for each one of those jobs what are the tasks that each person at that job does and then what is the automatability from language models of each one of those tasks. So you can kind of go back up the daisy chain and you can actually come up with numbers like what is the dollar value of the automatable um tasks in this industry. So
we can tell you what the AI basically language models impact is going to be on glass blowing industry and whatever else. And it's this kind of like I think it's incredibly important going forward that we move from this era of only focusing on language which language is really cool but there's also protein folding and therapeutics and a bunch of things there. There's a bunch of things in you know FA Lee and Yan Lun's companies. There's a bunch of things in humanoid robots. There's a bunch of economic things like what we're doing. And our kind of objective really is to start the move away from just language and starting to focus more where people can start to you know focus on things that are directly economic in nature. There are no companies really that are trying
to understand GDP better than we currently understand it. Um, and our real our our real objective here is and what we're doing right now is we're just um we're using the capital we need to do a little bit more exploitation to get this a little bit bigger um to make our models better so that we can you know follow these scaling laws so we can get the auto research uh running and then to use that to you know improve capital allocation at scale in the United States. So, our eventual first big product and when most people will actually hear about our company is we're going to launch an ETF that's managed by our AI um and people can put money into that. And instead of having your capital allocation algorithm be one that's driven entirely
by size, which is the way that the world works today with the index funds. I believe that in the future AI is going to manage the capital like just like how AI could be your doctor, AI could be these things, AI is going to be the retirement account manager. you know, you're instead of it being an out the world's simplest algorithm if size is big by it's going to be looking at every single thing in the entire economy, all the graphs and saying what is the best possible riskadjusted return that I can make with this person's capital. And if we can get that to happen at scale, our entire economy is going to grow so much faster and we're not going to have I think we can potentially eliminate recessions. Um if we if humanity
decides to do this, we live in a world where our capital allocation doesn't make dumb decisions in the short term or we actually account for the fact that there is a right price for things and we can if we can do that at scale, that would be uh that would be a much better vision and we could actually fund space exploration to the tune of trillions of dollars because we'll be able to properly value it. So that's kind of our long-term goal is we want to be the the uh kind of like Elon did where he he forced other car manufacturers to care about electric cars. We want to force asset managers to use AI by by showing that it can be done. And if we can do that, then I think we'll have succeeded.
>> Tucker, do you have a closing note? >> Well, to reiterate one of the points that you made, York, uh, in the middle of the conversation about how the economy is like a natural system and economics is an energy exchange. Ultimately, abstractly, it seems inevitable that intelligent cybernetic digitally based agents and intelligences are going to be the dominant actors in that space. Just like in the jungle, it's the apex predators. Um, and so we're just at the dawn of that. And of all the work that I've seen so far, your guys take on economic agent seems to be the furthest along and the most holistic view of how to build that. So, it's very exciting. exciting. And you can see how it's going to touch everything from a simple ETF trading agent in the beginning
to something that eventually is pulling levers in the real world across factories, space, the entire economy. It's really exciting. Um, just one uh niche question or not niche question but a specific question for your guys capital raise. Do you guys have a lower capital limit for individuals to invest? Just so if anyone out there wants to reach out to you guys. guys. Um, I mean we're doing like um for people we like not really. I mean, I think we're uh very much on the uh I would say I'm kind of a populist to be honest and I think that's kind of the point is like, you know, if you're open AI, if you wanted to be invested in them that you couldn't, right? And they kind of the retail now they can when when it's
already worth a trillion dollars. And our kind of a thesis has always been I want the people who can invest in my company, I want them to be able to invest 1 cent. Um and that's really the cool thing about ETFs is we can do that. And so like I would say very much so we're on the side of um yeah uh probably not one cent but like um if you know we've had um some of our like friends and some of the people where they were like you know I only have uh you know they're not they can't write 100k and of course there are like regulatory constraints on that kind of thing but in so far as that is you know attested to like yeah we we uh we'll take money from people
we think kind of understand where we're at and uh you know where these things are going. And one thing that you mentioned actually that is kind of interesting is uh the cybernetics side of things and a lot of the the you know almost kind of some of the Nick land style stuff I think actually where you cybernetics interestingly enough in Norbert Weiner and a lot of that stuff there you know the stochastic process stuff is a direct predecessor to reinforcement learning direct predecessor to model predictive control direct predecessor to the stuff that all of AI is doing today um was at one point, you know, back in the 30s and 40s. That's what it was. Like that was the state-of-the-art. And it's it's it's very cool because there's this route that we've taken through history
and it really is it it kind of looks like an inverse tree where it's like there's all these different fields of science, you know, probability theory and measure the set theory and calculus and all these things and they kind of, you know, trickle together until they combine into this model pred, you know, stochastic optimal control, you know, sequential decision-making under uncertainty thing. And it's really cool to see uh like I can look at a quantum computing paper and very quickly realize oh yeah here's exactly like you can literally take quantum computing quantum optimization algorithms and just with a there's equations that allow you to turn a wave function into a Hamilton Jacob Bman equation which is an you know reinforcement learning. So all this stuff is connected. It really is. It's it's kind of crazy
when you see it but uh it's very cool as well. York, thank you for the openness and willingness. And I will just close with a short William Blake line. He wrote, >> "Reason or the ratio of all we have already known is not the same that it shall be when we know more." >> Good quote. >> Thank you. Farewell. Godspeed. Godspeed. >> You too, York. Looking forward to hearing more, man. Good luck. Thanks.