Constrained episodic reinforcement learning in concave-convex and knapsack settings