Generate long, consistent videos from a single reference image. This is a unified diffusion model for animating images with human beings