Aton Kamanda

I am currently working on LLM agents for machine translation at Alexa Translations. Prior to this, I was working as a ML engineer at AwakeAI, where I built a real-time video understanding system based on V-JEPA architecture. Before this after graduating from Mila, I worked on several research initiatives focused on enhancing LLMs for code generation.

I am particularly interested in the prospect of extending the capabilities of monolithic LLMs to autonomous agents that can act, reason, self-correct, select goals and use tools.

Contact me if you are interested in collaborating.

About me

Applying state-of-the-art research in production.

  • My experience has been in translating state-of-the-art models into practical, high-performance solutions, navigating the challenges of transitioning from research papers to production models. This has given me the ability to quickly grasp complex papers from top machine learning conferences and efficiently implement them in real-world environments.

Building Neural Networks

  • Artificial neural networks are digital, allowing all copies of a model to share weights across diverse physical devices and communicate learned information via different parts of the data by sharing gradients updates. This property enables large neural networks to leverage parallelism, processing vast amounts of data and acquiring knowledge at a scale unattainable by humans. This is exemplified by LLMs ability to process almost all the content on the internet, or by AlphaGo's mastery of Go through extensive gameplay beyond what any human could achieve in a lifetime. These capabilities make the ability to leverage large amounts of compute and data crucial skills for producing systems with superhuman abilities.


  • Yet, the most effective way to improve a model's performance on a given task is by developing an intuition about how the model interacts with the data. There are unique subtleties in the training dynamics that can only be learned through consistent training and evaluation. For example, LLMs learn reasoning only if the dataset exhibits certain properties, regardless of the architecture, and they struggle with arithmetic without proper embeddings. Large neural networks like GPT-4 get outperformed on some reasoning tasks by much smaller models but with smarter training schemes. Understanding these nuances is crucial to get a model to perform the task you want it to.

About a theory of intelligence and cognition.

  • The free energy principle is currently the most convincing theory of cognition, it is very powerful in that it doesn't only try to explain what cognition is, it explains the mathematical rules that any agent doted with cognition should follow. The brain can be viewed as a particular solution nature found to implement approximate bayesian inference in biological organisms and AI research can be viewed as finding the best ways to implement it into machines with computational restraints which what Solomonoff's algorithmic probability theory is about.

  • A lot of complex mental phenomena previously thought to be mysterious make a lot more sense under this paradigm, such as perception and action, consciousness, its ineffability, emotions, and selfhood, or mental disorders such as schizophrenia, addiction, and maybe even depression.