Bellman equation

@glimpse [ coming soon ]

@wiki https://en.wikipedia.org/wiki/Bellman_equation

The Bellman equation was formulated by Richard Bellman as a way to relate the value function and all the future actions and states of an MDP.

The function defines the value of a state as the sum over all discounted rewards which are weighted by the probability of their occurrence given by a certain policy π.

π(a∣s) is the therm as a probability of taking an action ( a ) in a state ( s ) under the policy π.

π simply states what the agent ought to do in each state. This could be stochastic or deterministic.

P(s′∣s,a) is the probability of moving to the next state $s′$ s′ and receiving the reward $R$ R given our current state $s$ s and taking action $a$ a.

( $R+γVπ(s′)$ R+γVπ(s′))

Previousmarkov probability NextHamilton–Jacobi–Bellman (HJB) equation

Last updated 2 years ago