Context
Here's a prompt we use to update website copy based on a voice message: "You are a copywriter. The user wants to update their website. Apply their requested changes. Current copy: {current_copy}. User request: {transcript}". It works ~60% of the time. Sometimes it changes too much, sometimes too little, sometimes it misses the user's intent.
Your Task
Rewrite this prompt to get consistent, high-quality results. Then design an eval: how would you measure if your prompt is actually better? What metrics, what test cases, what threshold for "good enough"? Show your thinking and use AI to stress-test your approach.