Warehouse Robot (Gridworld)
Discrete actions in a 2D warehouse: pick items from shelves, then drop them at the dropoff. Try manual control, a random policy, or train a tabular Q-learning agent.
EditUse these modes to configure the environment by clicking cells in the grid.
- Shelves: click a cell to add a shelf; keep clicking to increase its item count; it removes after the max.
- Start: click a cell to set the robot’s starting position.
- Dropoff: click a cell to set where items must be delivered.
Cumulative reward (per timestep)This line adds up rewards over time during the latest run.
- Big upward jumps usually mean a successful dropoff.
- Gradual downward drift comes from the small step cost.
- Sharp drops can come from invalid actions.
State
Position
(1,1)
Carrying
0 / 2
Delivered
0 / 5
Return
0.00
Use arrows + P/D keys, or click buttons.
Markov Decision ProcessThis is an empirical view from the latest rollout (not the full underlying MDP).
- Node color: how often the agent visited that state.
- Arrows: how often an action was taken from a state (movement actions only).
Hover an arrow to see state → action probability and the most recently observed next state.
Colors: P(state) from latest rollout; edges: P(action|state) ≥ 0.05
Run an episode (random or greedy) to populate probabilities.