Warehouse Robot (Gridworld)

Discrete actions in a 2D warehouse: pick items from shelves, then drop them at the dropoff. Try manual control, a random policy, or train a tabular Q-learning agent.

Edit

Max shelf itemsCapacityMax steps

Cumulative reward (per timestep)

State

Position

(1,1)

Carrying

0 / 2

Delivered

0 / 5

Return

0.00

Use arrows + P/D keys, or click buttons.

Markov Decision Process

Colors: P(state) from latest rollout; edges: P(action|state) ≥ 0.05

Run an episode (random or greedy) to populate probabilities.

Policies

Choose a policy

Seed

Policy

Q-learning

Tabular baseline (good for small grids + few shelves).

Q size: 0

EpisodesSteps/episodeγ (discount)ε (explore)α (learn rate)

High γ emphasizes long-term return (planning), but can learn slower and be more sensitive to reward shaping.

Episode return (per episode)