Training Data

Definition

The collection of examples an AI model learns from during training; its quality and quantity largely determine how well the model performs.

Training data is the set of examples used to teach an AI model. In machine learning, a model doesn’t follow hand-written rules — it learns patterns from this data, then applies them to new, unseen inputs.

Training data can be labelled (each example comes with the correct answer) or unlabelled (the system finds structure on its own). For a spam filter, the training data might be thousands of emails marked “spam” or “not spam”.

Two principles matter enormously:

  • Garbage in, garbage out — messy or unrepresentative data produces unreliable models.
  • Bias in, bias out — if the data reflects unfair patterns, the model can learn and repeat them.

That’s why gathering, cleaning, and checking training data is one of the most important parts of building AI. Learn more in how AI works.