Moral Alignment for LLM Agents
Published in arXiv Preprint., 2024
We introduce the design of intrinsic reward functions for the moral alignment of LLM agents. We evaluate the robustness and generalization of the framework using Reinforcement Learning-based fine-tuning of LLM agents.
Recommended citation: Tennant, E., Hailes, S., Musolesi, M. (2024). "Moral Alignment for LLM Agents." arXiv 2410.01639. https://arxiv.org/abs/2410.01639 https://arxiv.org/abs/2410.01639