Second Place Solution in Fall 2025 Simulation Racing Series
Title: Outcome-driven Optimization Loop for ROAR Simulation Racing Competition
Written by Youssef Khalafalla, a senior at John C Kimball High School, on behalf of Kimball Jags
Table of Contents
Abstract
A lightweight learning loop that sits outside the simulator was implemented to improve completion time by utilizing a numeric reward model that is asymmetric to reflect risk. Positive rewards were applied for completing a section without incidents, completing a full lap, and improvement of completion time compared to the previous lap. Negative rewards were applied for CAS activation, crashes, and increase of completion time compared to the previous run. Manual optimization in addition to AI optimization resulted in a measurable reduction of the completion time.
Introduction
The University of California Berkeley Robot Open Autonomous Racing (ROAR), a program of the FHL Vive Center for Enhanced Reality in the College of Engineering, introduces students to autonomous driving algorithms and organizes three simulation races every year. ROAR received funding from industry sponsors, the National Science Foundation, and the Army Research Laboratory [1]. This paper presents the approach used for the second-place submission in the sixteenth simulation race organized by roar in the fall of 2025.
Background
Performance of UC ROAR solutions has converged over multiple seasons. The top seven teams in the Summer 2025 race finished within one percent of the completion time. This difference shrinks to 0.13% of the completion time (0.4 seconds) for the top three teams [2]. This indicated that competitors achieved near maximum optimization of the solutions over the past six races performed on the official Berkeley Monza Map v1.1 [3]. It was apparent that refining the way points, section divisions of the track, steering, braking, and throttle controls using traditional methods will not be effective in further reducing the completion time. Introduction of an AI learning loop in addition to manual refining of variable controlling performance was essential to improve the completion time.
The solution presented builds on the Kimball Jags third‑place Summer 2025 controller [4], which introduced a Collision Avoidance System (CAS) that reduces crash rates and would allow additional increase of average speed. There was room to improve the CAS itself by refining the Time till Collision (TTC) threshold activating the system for each section depending on the car direction at the time of collision.
Optimization of Controller Variables
The code written for Kimball Jags [4] was revised to allow for additional refinement of variables. Kimball Jags’ solution relied on dividing the track into ten sections. The main variables that affected the completion time were target speed, brakes, and Time till Collision (TTC). The target speed equation in Kimball Jags solution was given by:
This equation was modified to be:
Where µ is a constant with different set values for each section
g is gravitational acceleration
r is the turning radius of upcoming way points
Kv is the variable introduced to increase the target speed
Trial runs to optimize Kv for the fastest completion time resulted in the values indicated in Table I.
Kimball Jags [4] modified the value of the brakes for each section separately by multiplying the braking value by a reduction multiplier. Most collisions occur on the second or third lap – due to the lower speed on the first lap where CARLA [5] starts from stationery. An additional constant, KB, was introduced as brakes multiplier to allow different reduction of brakes with different values for the same section in different laps. Trial runs to optimize KB for the fastest completion time were performed while implementing Kv values shown in Table I, resulting in the values indicated in Table II.
Kimball Jags [4] noted that the CAS was triggered in laps 2 or 3 and in sections 3 and 5 only, and adjusted TTC to trigger the CAS accordingly. A variable, KT, was introduced to allow modification of the TTC simultaneously with other variables. The values in Table III show the values of KT that resulted in the best completion time based on trial runs implementing Kv values shown in Table I and KB values shown in Table II.
Based on the above, a text file included the values of the three variables Kv, KB, and KT for each of the ten sections and each of the three laps. This introduced seventy refined values that were needed to achieve the shortest race completion time based on trial runs.
Artificial Intelligence Implementation
Additional improvements in completion time could not be achieved through trial runs. An automated solution was necessary for further improvement. A Distributed Evolutionary learning Algorithm (implemented using the deap library) that sits outside the simulator and reasons entirely from observed outcomes was built. Each run starts from a clean slate by fully restarting CARLA and then launching competition_runner.py again before collecting any data. This reduced run to run variability caused by lingering physics state timing drift or caching behavior and makes comparisons between configurations meaningful rather than noisy.
After each run is completed, the program reads the structured results produced by the competition framework. These include section completion times per lap, total lap times, whether CAS was activated, and whether a crash has occurred. Instead of trying to predict behavior inside the simulator the AI treats these results as evidence of how a specific configuration behaved under realistic variability. It evaluates performance by aggregating time and event-based signals rather than assuming that faster runs always mean better. This allows it to recognize that some settings work well on average even if a single run is imperfect.
The only control variable the AI is allowed to modify is µ. This variable directly scales aggressiveness through the velocity rule given by equation (2). Higher value for µ increases corner entry and exit speed while a lower value makes the car more conservative. Separate µ values for each section were explicitly defined because tire grip conditions, behavior, and stability change over a run. Applying one value across all laps produced worse results in practice. All µ values are clamped between 2 and 4 which was the range determined empirically as the safe and meaningful operating range. Values below 2 were consistently too slow and values above 4 produced frequent CAS activation and crashes.
The reward model is numeric and intentionally asymmetric to reflect risk. Completing a section without incidents (crash) adds positive 1 reward. Completing a full lap adds positive 10 reward. If a lap time improves relative to the best previously observed time for that lap it adds an additional positive 5 reward. Each CAS activation adds negative 5 reward because it indicates that the car exceeded stable limits even if it did not crash. A crash adds negative 50 reward and immediately terminates learning for that run since post-crash data is not representative of the target performance during a race. If the total run time is slower than the previous reference, an additional negative 10 reward is applied even if no crash occurred.
Once a run ends the AI compares the total reward and the lap and section times against a rolling history of prior runs rather than a single baseline. This matters because CARLA exhibits natural stochasticity and judging performance off one comparison leads to overfitting. If the configuration shows statistically better behavior across time, faster laps fewer CAS events, or better reward consistency the AI nudges the corresponding lap µ upward by a small, fixed step of 0.05. If the performance degrades or CAS frequency rises the AI nudges µ downward. These updates are deliberately small, so the system explores locally around known good regions rather than jumping into unstable regimes. Because µ is adjusted independently per lap the system can learn for example that a higher µ works well on Lap 1 but needs to be reduced later once the car becomes more difficult to control.
The learning process ran entirely on an RTX 5070 TI GPU. Although there is no heavy neural network the GPU is used for fast numerical aggregation and comparison across many runs which allows high iteration counts without becoming compute bound. Over repeated restarts and evaluations the AI converges toward configurations that are fast but stable under variability rather than simply optimizing a single best-case run.
The values in Table IV show the values µ obtained through the application of this lightweight learning loop.
The application of the values shown in Tables I to IV in addition to the AI optimization resulted in the reduction of the completion time to be 321.45 seconds compared to 322.15 seconds achieved in the previous race, without increasing the collision rate from the 7% reported in the original solution.
Conclusion
A measurable reduction in completion time was achieved by tuning the target speed using a lightweight learning loop that sits outside the simulator that utilizes a numeric reward model that is asymmetric to reflect risk. Positive rewards were applied for completing a section without incidents, completing a full lap, and improvement of completion time compared to the previous lap. Negative rewards for CAS activation, crashes, and increase of completion time compared to the previous run.
Future developments could apply the lightweight learning loop simultaneously to all variables affecting the completion time.
Acknowledgements
Dr. Allen Yang and Mr. Huo Chao Kuan guidance and mentorship were essential to complete this study.
References
- “Robot Open Autonomous Racing (ROAR™),” https://roar.berkeley.edu/
- “Summer 2025 Simulation Racing Series Results,” https://roar.berkeley.edu/summer-2025-simulation-racing-series-results/
- “Monza Map,” https://roar.berkeley.edu/monza-map/
- “Third Place Solution in Summer 2025 Simulation Racing Series,” https://roar.berkeley.edu/third-place-solution-in-summer-2025-simulation-racing-series/
- ” A. Dosovitskiy, G. Ros, F. Codevilla, et al., “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V. Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 1–16. [Online]. https://proceedings.mlr.press/v78/dosovitskiy17a.html