A desirable property of an AI model that generally tanks accuracy.

A algorithmic model is "explainable" if it is possible for a human to understand why the model made a particular decision. Despite a growing demand from the business community, most of the state-of-the-art AI models are not explainable; they are "black boxes".

Designing explainable AI models that are as accurate as non-explainable models is an active area of Computer Science research.

Let's look at an example.

Say we wanted to use an algorithm to recommend whether a student should be accepted to a particular college. A simple algorithm might make the decision based a GPA cutoff.

GPA of 3.0 or higher => ACCEPT

GPA lower than 3.0 => REJECT

This model is explainable because it is easy to trace back a decision to the reason behind the decision.

Alternatively, we could train a statistical model on the historical data about all the applicants to the college and whether they were accepted or not. The model would be given all the data associated with students' applications (test scores, essays, recommendation letters, etc.) and it would be trained to predict whether a student was accepted or not. Once the model is trained, it could be used to make suggestions about future applicants.

Statistical models like this will learn to identify patterns in the data that can be used to predict the outcome that they are trained to predict. If, by chance, our college had accepted all the applicants who had written essays about elephants, the AI model might suggest we accept all future students who write essays about elephants. This is clearly silly but totally possible, and with many models, it would be nearly impossible to catch.