In information theory, the cross entropy between two probability distributions and over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set, if a coding scheme is used that is optimized for an “unnatural” probability distribution , rather than the “true” distribution . The cross entropy for the distributions and over a given set is defined as follows: where is the entropy of , and is the Kullback-Leibler divergence of from (also known as the relative entropy of p with respect to q — note the reversal of emphasis). For discrete and this means and

