AI-generated pixel art style videos are blowing up right now. But which tool makes them best? So I compared them honestly with the actual videos I created.

How I put the AIs to work

🔨 Tools I used

  • Veo3 by Google DeepMind
  • Sora by OpenAI
  • Hailuo AI

❓ What I asked AI to create

I asked AI tools to create a fantasy-themed pixel art video about exploring ancient ruins.

📝 The prompts I gave each tool

I tested three prompts at different levels: simple, detailed, and micro-detailed. I wanted to see how each AI tool perform with more or less detail.

View simple prompt

First-person view of a player holding a glowing blue lantern while walking slowly through pixel-art ruins at midnight. Crumbling arches, glowing glyphs, and floating particles fill the moonlit air, with soft 16-bit ambient music and echoing footsteps in the dark.

View detailed prompt

“Exploring enchanted ruins beneath a glowing moon”

Shot: First-person view, pixel-art simulated lens, 12fps, slowly walk through uneven stone pathways

Subject: Player character’s hands visible. Left hand holding a dimly glowing blue lantern, right hand occasionally brushing aside pixelated vines or resting on a worn stone wall for balance

Scene: Midnight in a vine-choked, enchanted ruin hidden within a collapsed temple; broken archways, floating runes, and moonlight pouring through shattered ceilings into dust-filled air

Visual: Lantern casts soft teal pixel light onto moss-covered stone, glyphs on the walls glow faintly as player passes, magical particles hover in air; ghostly silhouettes flicker and vanish at the edge of visibility

Cinematography: Cool color palette of stone grays, luminous blues, and silver moonlight; mystical and quiet, evoking a sense of wonder and ancient forgotten power

Audio: 16-bit ambient fantasy music with echoing harp tones and airy synths; distant dripping water, soft wind blowing through corridors, and the lantern’s pulsing hum provide atmospheric layering

View micro-detailed prompt

"Through forgotten ruins beneath a fractured moon"

Shot: First-person view with a pixel-art simulated lens; locked at 12fps for a classic retro animation feel. The camera slowly walks forward, with subtle head bobbing and handheld-style shake adding realism as it glides steadily through narrow, collapsing stone paths. Each step reveals more of the crumbling corridor. Depth is emphasized with parallax layers of vines, foreground rubble, and distant silhouettes that shift slightly, enhancing the sense of motion and immersion.

Subject: Player’s hands wear worn explorer gloves, slightly trembling. The left hand tightly grips an ornate, rune-etched lantern emitting a soft cyan glow; the right steadies along the stone wall or swats aside floating motes. Occasional breathing fog drifts forward, adding a sense of life and urgency.

Scene: Midnight in a vast underground ruin overtaken by arcane growth, buried beneath a shattered temple. Massive arches loom above, with light from a broken moon filtering through the collapsed ceiling in sharp pixel beams. Runes hover midair, whispering magic. A dried fountain trickles with glowing blue liquid that defies gravity.

Visual: Lantern light casts pixelated flickers on moving vines and reflective stone. Magical glyphs animate as the player approaches, rotating or reacting to the light. Wisps drift around with trailing glow trails, and far off, a silhouette of a cloaked figure phases briefly in and out of view. Every motion has pixel-style atmospheric particles (dust, magic motes, lens shimmer).

Cinematography: Deep blues, muted greens, soft silvers, and selective glow layers compose a color script that evokes mystery and old-world magic. Pixel “bloom” around lantern light contrasts against shadowed corners. The tone is ethereal, quiet, and foreboding like something ancient is watching.

Audio: Layered 16-bit ambient score. shimmering harp loops, reverb-heavy synth pads, and subtle reverse chimes. The soundscape includes echoing water drips, footsteps on mossy stone, distant mechanical groans of the ruin shifting, and whispering glyphs that fade in and out. The lantern hums with magical tension, growing louder near certain glyphs.

How the AIs handled each prompt

(Since Sora and Hailuo do not support audio in their text-to-video feature, I excluded it from the comparison.)

1️⃣ Simple prompt

Veo3

  • Best at understanding the prompt and adding detailed elements like crumbling arches and particles.
  • It even added a game UI based on the word “player,” though it was unstable.
  • Visuals felt little cluttered, and video quality was poor.
  • Camera movement was the most natural.

Sora

  • Weakest overall.
  • The video felt flat, with very basic visuals and little environmental detail.
  • The camera movement was unnatural, more like a drifting camera than a first-person walk.

Hailuo

  • Best video quality of the three.
  • While it lacked the fine detail Veo3 offered, it balanced visual elements well and offered a clean, polished result.
  • Camera motion could be improved but was decent.
Simple prompt comparison

💡 Simple prompt takeaway
For this level of prompt, Hailuo felt the most usable.

2️⃣ Detailed prompt

Veo 3

  • Better visual balance with moss and particles adding atmosphere.
  • Missed some elements like the ghost silhouette, but still handled the prompt best despite lower video quality.
  • Tried to follow the prompt about hands, but the result felt awkward.

Sora

  • Showed the biggest improvement.
  • Camera movement reflected footsteps, and background details were much better than in the first try.
  • However, it still lacked variety and depth, and its understanding of the prompt remained weak.

Hailuo

  • Surprisingly weaker than before.
  • The scene felt flat and the first-person perspective was poorly executed.
  • While it tried to include all prompt elements, they felt disconnected and awkward.
  • Also reflected the hand-related prompt, but the result was a bit off
Detailed prompt comparison

💡 Detailed prompt takeaway
Veo3 came out on top for this detailed prompt.

3️⃣ Micro-detailed prompt

Veo 3

  • Captured nearly all elements from the prompt, though not perfectly.
  • Character and building detail improved, but background quality dropped.
  • Moonlight, which was well-rendered in the previous test, looked flat here.

Sora

  • Big step back.
  • Camera movement became stiff again, and most prompt details were missing.
  • Odd creative choices like floating lamps added confusion.

Hailuo

  • Better integration of prompt elements than before.
  • Slight improvement in first-person perspective, but lost depth and spatial realism due to lack of camera motion.

💡 Micro-detailed prompt takeaway
Micro-detailed prompts seemed to confuse the models. The more specific the request, the harder it was for the AIs to maintain consistency, realism, and focus.

Prompt-by-prompt winners

🏆 Winner for the simple prompt

Hailuo AI stood out for its fast generation time and clean, balanced visuals. While it lacked some detail, it delivered the most usable result at this level.

🏆 Winner for the detailed prompt

Veo3 handled the richer prompt best, striking a strong balance between background atmosphere and element placement. Despite lower video quality, it captured the mood and key visuals most effectively.

🏆 Winner for the micro-detailed prompt

None of the tools nailed it perfectly, but Veo3 managed to reflect the prompt most accurately. Even with visual inconsistencies, it stayed more focused than the others.

What I learned from testing all three

1. More detail isn’t always better

It’s tempting to cram every idea into a super-specific prompt, but going too far can backfire. Some AI tools get overwhelmed by micro-level instructions, which leads to awkward visuals or missing key elements altogether. Try to focus on what really matters in your scene.

2. Choose your tool based on your goal

Need something fast and decent? Go with Hailuo.

Want rich detail and mood? Veo3 is a strong pick.

Still curious to experiment? Sora is getting there, but it's not quite ready for complex prompts.

No single tool is perfect, so match your choice to the project, not just the hype.

3. Know the limits of AI video

Even the best models can get weird with abstract ideas, complex lighting, or subtle camera movements. Expect some quirks. The good news? You can often guide the result with clearer structure or by testing focused prompts. Trial and error is part of the fun.

Which tool do you think did the best?