**Abstract:** Planning in hybrid systems with both discrete and continuous control variables is important for dealing with real-world applications such as extra-planetary exploration and multi-vehicle transportation systems. Meanwhile, generating high-quality solutions given certain hybrid planning specifications is crucial to building high-performance hybrid systems. However, since hybrid planning is challenging in general, most methods use greedy search that is guided by various heuristics, which is neither complete nor optimal and often falls into blind search towards an infinite-action plan. In this paper, we present a hybrid automaton planning formalism and propose an optimal approach that encodes this planning problem as a Mixed Integer Linear Program (MILP) by fixing the action number of automaton runs. We also show an extension of our approach for reasoning over temporally concurrent goals. By leveraging an efficient MILP optimizer, our method is able to generate provably optimal solutions for complex mixed discrete-continuous planning problems within a reasonable time. We use several case studies to demonstrate the extraordinary performance of our hybrid planning method and show that it outperforms a state-of-the-art hybrid planner, Scotty, in both efficiency and solution qualities.

## Planning with Linear Hybrid Automata

In this work, we use automata to specify our planning problem. A linear hybrid automaton with inputs is a tuple $ \mathcal{H} = \langle V = (Q \cup E), q_\texttt{init}, q_\texttt{goal}, J, F \rangle $: (1) $Q = L \cup X$ is the set of state variables. $L$ is the set of discrete state variables, and $X$ is the set of continuous state variables. (2) $E$ is the set of input variables (3) $q_\texttt{init}$ is an initial state, and $q_\texttt{goal}$ is a predicate that represents a set of goal states. (4) $J$ is the set of jumps. A jump is associated with a condition $\textit{cond}$ and an effect $\textit{eff}$. The condition $\textit{cond}$ is a predicate, which is also known as the guard condition or the enabling condition of the jump. An effect $\textit{eff}$ specifies how the value of the state variables changes when the jump occurs. (5) $F$ is the set of flows for the state variables. A flow is associated with a differential equation and a condition. At each time, multiple flows $f \in F$ can be activated and together specify the evolution of the continuous state variables. A valid solution is a valid run of this automaton.

Mars transporation example. | Automata example. |

## Solving as Mixed Integer Linear Programs

We model the above automata as a MILP and solve by using Gurobi. Our MILP encoding is a mixed integer-linear extension to the linear program encoding of flow tubes with linear dynamics. In this encoding, the elapsed time of each action is modeled as an variable to be adaptive. Thus, this encoding only requires mush less variables than discretizing timeliens with fixed time steps and can support long-horizon planning.

We define a set of variables $\{Q_0, Q_1,.., Q_n\}$ after $a_i$ occurs and right before $a_{i+1}$ occurs. We also have $E_i \in \{E_0,E_1,..,E_{n-1}\}$, corresponding to the values of $E$ when $a_i$ occurs. To represent the actions that happen at each step, we define a set of binary activation variables $\{P_0, P_1,..,P_{n-1}\}$. $P_i$ is the union of $P^J_i$ and $P^F_i = P^{F_0}_i \cup P^{F_1}_i \cup,..,P^{F_K}_i$, which are the activation variables at step $i$ for jumps $J$ and flows $\{F_1, F_2,..,F_K\}$, respectively. Each $p_i^o \in P_i$ corresponds to an operator $o$ (i.e., a jump or a flow) at step $i$. If $p_i^o = 1$, operator $o$ is activated at step $i$; otherwise, $o$ is inactivated. To fully determine the effects of flows, we also specify the cumulative effects of the input variables and the elapsed time. Thus, we define $d_i$ with domain $[0,\infty)$ to represent the elapsed time during step $i$; and real variable $\Delta$ denotes $\int_{0}^{d_i} E_i dt$, the cumulative effects of $E_i$ during step $i$.

The minimizing objective is the total elapsed time. There are four kinds of constraints: (1) the initial and goal states are respected; (2) either a jump or a set of flows are active at each steps. (3) the conditions and effects of activated jumps; and (4) the conditions and effects of activated flows. In this encoding, the implication and conditional activation are expressed by using the Big-M method. And the variable and constraint numbers only increase linear as the variables and operators in the automata increase.

# Experimental Resulls

To demonstrate the efficiency and solution qualities of our method, we benchmarked against Scotty on three domains: Mars transportation, air refueling, and truck-and-drone delivery. In addition to dealing with different dynamics under a large number of modes, all these three domains require judiciously coordinating heterogeneous agent teams for cooperation and carefully reasoning over resources to decide necessary recharging or refueling. The experimental results show that our approach can find high-quality solutions for all the problems in seconds and provide optimality proof for most examples, while Scotty fails to solve half of the problems within 600 seconds. Moreover, the makespans of our first solutions returned within $1$ second are already better than those of Scotty, and our final solutions can significantly improve them.

### Mars Transportation Domain

The Mars transportation domains involve reasoning over obstacle avoidance and battery consumption under different terrains, such that the astronaut can reach the destination with the help of the rover in the shortest time. We test on four cases: (1) the rover directly picks up and delivers the astronaut to the destination; (2) the rover does not have enough battery for the trip or going to the charge station, and the astronaut has to walk; (3) the rover picks up and delivers the astronaut but has to recharge during the trip; (4) the rover picks up and delivers the astronaut after recharging.

### Air Refueling Domain

In this domain, autonomous Unmanned Aerial Vehicles (UAVs) need to take pictures of several regions before landing at the destination location. Since a UAV has limited fuel, it needs to refuel in-air from a tanker plane. (1) the UAV takes photos for three regions and does not need refuelingl; (2) the UAV takes photos for four regions and refuels once along the route; (3) the UAV takes photos for ten regions and refuels twice along the route; (4) two UAVs take photos for eight regions along two different routes. While one UAV does not need refueling, the other one refuels once.