Warehouse Robot (Gridworld)

Discrete actions in a 2D warehouse: pick items from shelves, then drop them at the dropoff. Try manual control, a random policy, or train a tabular Q-learning agent.

EditUse these modes to configure the environment by clicking cells in the grid. - Shelves: click a cell to add a shelf; keep clicking to increase its item count; it removes after the max. - Start: click a cell to set the robot’s starting position. - Dropoff: click a cell to set where items must be delivered.
Cumulative reward (per timestep)This line adds up rewards over time during the latest run. - Big upward jumps usually mean a successful dropoff. - Gradual downward drift comes from the small step cost. - Sharp drops can come from invalid actions.
1.000.00
State
Position
(1,1)
Carrying
0 / 2
Delivered
0 / 5
Return
0.00
Use arrows + P/D keys, or click buttons.
Markov Decision ProcessThis is an empirical view from the latest rollout (not the full underlying MDP). - Node color: how often the agent visited that state. - Arrows: how often an action was taken from a state (movement actions only). Hover an arrow to see state → action probability and the most recently observed next state.
Colors: P(state) from latest rollout; edges: P(action|state) ≥ 0.05
Run an episode (random or greedy) to populate probabilities.
State (x=0, y=0) P(state)=0.000State (x=1, y=0) P(state)=0.000State (x=2, y=0) P(state)=0.000State (x=3, y=0) P(state)=0.000State (x=4, y=0) P(state)=0.000State (x=5, y=0) P(state)=0.000State (x=6, y=0) P(state)=0.000State (x=7, y=0) P(state)=0.000State (x=8, y=0) P(state)=0.000State (x=9, y=0) P(state)=0.000State (x=0, y=1) P(state)=0.000State (x=1, y=1) P(state)=0.000State (x=2, y=1) P(state)=0.000State (x=3, y=1) P(state)=0.000State (x=4, y=1) P(state)=0.000State (x=5, y=1) P(state)=0.000State (x=6, y=1) P(state)=0.000State (x=7, y=1) P(state)=0.000State (x=8, y=1) P(state)=0.000State (x=9, y=1) P(state)=0.000State (x=0, y=2) P(state)=0.000State (x=1, y=2) P(state)=0.000State (x=2, y=2) P(state)=0.000State (x=3, y=2) P(state)=0.000State (x=4, y=2) P(state)=0.000State (x=5, y=2) P(state)=0.000State (x=6, y=2) P(state)=0.000State (x=7, y=2) P(state)=0.000State (x=8, y=2) P(state)=0.000State (x=9, y=2) P(state)=0.000State (x=0, y=3) P(state)=0.000State (x=1, y=3) P(state)=0.000State (x=2, y=3) P(state)=0.000State (x=3, y=3) P(state)=0.000State (x=4, y=3) P(state)=0.000State (x=5, y=3) P(state)=0.000State (x=6, y=3) P(state)=0.000State (x=7, y=3) P(state)=0.000State (x=8, y=3) P(state)=0.000State (x=9, y=3) P(state)=0.000State (x=0, y=4) P(state)=0.000State (x=1, y=4) P(state)=0.000State (x=2, y=4) P(state)=0.000State (x=3, y=4) P(state)=0.000State (x=4, y=4) P(state)=0.000State (x=5, y=4) P(state)=0.000State (x=6, y=4) P(state)=0.000State (x=7, y=4) P(state)=0.000State (x=8, y=4) P(state)=0.000State (x=9, y=4) P(state)=0.000State (x=0, y=5) P(state)=0.000State (x=1, y=5) P(state)=0.000State (x=2, y=5) P(state)=0.000State (x=3, y=5) P(state)=0.000State (x=4, y=5) P(state)=0.000State (x=5, y=5) P(state)=0.000State (x=6, y=5) P(state)=0.000State (x=7, y=5) P(state)=0.000State (x=8, y=5) P(state)=0.000State (x=9, y=5) P(state)=0.000State (x=0, y=6) P(state)=0.000State (x=1, y=6) P(state)=0.000State (x=2, y=6) P(state)=0.000State (x=3, y=6) P(state)=0.000State (x=4, y=6) P(state)=0.000State (x=5, y=6) P(state)=0.000State (x=6, y=6) P(state)=0.000State (x=7, y=6) P(state)=0.000State (x=8, y=6) P(state)=0.000State (x=9, y=6) P(state)=0.000State (x=0, y=7) P(state)=0.000State (x=1, y=7) P(state)=0.000State (x=2, y=7) P(state)=0.000State (x=3, y=7) P(state)=0.000State (x=4, y=7) P(state)=0.000State (x=5, y=7) P(state)=0.000State (x=6, y=7) P(state)=0.000State (x=7, y=7) P(state)=0.000State (x=8, y=7) P(state)=0.000State (x=9, y=7) P(state)=0.000