Crafting the Future of Gaming: Personalized Experiences through Productized Reinforcement Learning Models
By Michael Kolomenkin
Ensuring an engaging, diverse, and personalized gaming experience for our players is not just our bread and butter – it’s at the heart of Playtika’s success. It forms our daily objective and sets the stage where technology and creativity meet and game design collaborates with engineering.
This grand mission involves a myriad of talents, where game designers, developers, and AI experts collaborate, each contributing a crucial thread to the tapestry that forms the Game.
Our AI Personalization streams, a product of this collaboration, aims to transform gaming from a one-size-fits-all model to an experience tailored to individual players’ tastes. To achieve this, we created the OptX-AI system, a pioneering real-time recommender system.

[Call to action: We realize that personalization challenges are not unique to Playtika or the gaming industry. if you’ve worked on similar problems in your professional career, please share your experiences in the comment section!]
Personalization in mobile gaming, so we’ve learned, is about navigating and solving sometimes contradictory domain-driven limitations. For example:
- Game studios aspire to incorporate innovative features as part of the recommender system menu or personalize the experience for a new population of players. This leads to the notorious cold start problem. How do you personalize the experience for people whose tastes you to do not know or for features that lack data?
- Data silos pose a significant obstacle to the platform’s cross-studio ambitions.
- Even the treatments themselves should be personalized: for instance, if two players receive the same treatment, such as coins, the quantity they each receive should be based on their recent activity within the game. How would a Netflix recommender system adapt to a personalized story line of its movie collection? Would Di Caprio in Titanic die for some, but not for others?
- The duration of campaigns adds another layer of complexity to the scenario. Indeed, it’s a true jigsaw puzzle waiting to be solved.
[Call to action: Can you relate to these challenges in your business? Have you found any creative solutions to tackle these challenges to avoid them altogether? ]
When faced with the cold start problems, we opted for reinforcement learning. It felt like diving headfirst into uncharted waters with no historical data to rely on.
We found our guiding star in multi-arm bandits, especially the Bayesian ones. In the vast galaxies of ML models, they are shine for their rapid learning ability, their skill at solving the cold start problem, and their flexibility to handle varying campaign durations.
The simple model, the Stochastic Multi-Armed Bandit (SMAB), is remarkably capable of operating under a limited data scenario due to cold start or silos. However, it is not inherently designed for personalization or to differentiate between player tastes.
[Call to action: Join us on our journey to personalization and stay ahead of the curve with our insights!]
When ML professionals speak of personalization, they often mean very fine segmentation. Many models used for building personalization, do not really treat an individual as unique, but assign them to a segment of like-minded individuals with similar tastes and characteristics. We are social creatures, after all!
We used this principle to introduce a preliminary level of “personalization” to our solution, overcoming the fact that the SMAB does not inherently provide this functionality. Yet, it still benefits from all its inherent advantages.
Specifically, our approach invovled developing a segmentation and deploying an SMAB for each segment. The result was nothing short of remarkable – a 5+% KPI uplift compared to the best expert treatment for a highly optimized Studio, and a significant uplift compared to the absence of treatment. It was indeed a successful run!
In parallel, we have been addressing the data silos for the next step – true personalization with the Contextual Multi-Arm Bandit (CMAB) model, where treatment reward probabilities depend on individual player attributes.
We recently released our own Python library: pybandits (https://github.com/PlaytikaOSS/pybandits) as open-source. While we are adding more bandit implementation and functionalities, we welcome feedback and contributions from the community.
Reinforcement learning-based models in the hands of business users as a self-service
Despite its benefits, the adoption of AI products can still face resistance, distrust, and fear from non-technical business users.
We engages a reliable incubation partner to test our models and infrastructure, and obtain business feedback. This collaboration, importantly, also provided us with a set of trustworthy use cases, enabling a compelling pitch for others to join in.
And while this helped us with the adoption of the solution by other business units, making the tool a self-service presented a real challenge.
Our UI, designed to assist business operators, turned out to be challenging for many to navigate. This made us realize that our tool needed to be more user-friendly and minimize mis-configurations.
[Call to action: Share your UI/UX challenges and how you overcame them – your experiences could help others!]
To address these challenges, we’re focusing on several key areas:
- Enhancing the user interface to maximize user-friendliness and minimize errors with automated checks.
- Developing an “auto” button to simplify decision-making.
- Establishing an “account manager” role to provide guidance
Our journey is far from over. We’re continually learning and exploring how to best equip non-technical users with a complex AI tool as a self-service and the journey is indeed exciting! As we strive to anticipate and create novel experience flows for our users, our infrastructure and AI algorithms continue to evolve to address complex business needs.