Journal of Mathematical Economics, 44: 24-32, 2008
Abstract: An observer of a process (x_t) believes the process is governed by Q whereas the true law is P. We bound the expected average distance between P(x_t|x1,…,x_{t−1}) and Q(x_t|x_1,…,x_{t−1}) for t = 1,…,n by a function of the relative entropy between the marginals of P and Q on the n first realizations. We apply this bound to the cost of learning in sequential decision problems and to the merging of Q to P.