Bellman equation

@glimpse [ coming soon ]

@wiki https://en.wikipedia.org/wiki/Bellman_equation

The Bellman equation was formulated by Richard Bellman as a way to relate the value function and all the future actions and states of an MDP.

The function defines the value of a state as the sum over all discounted rewards which are weighted by the probability of their occurrence given by a certain policy π.

π(a∣s) is the therm as a probability of taking an action ( a ) in a state ( s ) under the policy π.

π simply states what the agent ought to do in each state. This could be stochastic or deterministic.

P(s′∣s,a) is the probability of moving to the next state ss′s′ and receiving the reward RRR given our current state sss and taking action aaa.

(R+γVπ(s)R+γVπ(s′)R+γVπ(s′))

Last updated