TEXPLORE: Temporal Difference Reinforcement Learning