From catching out to helping out: embedding transparent AI collaboration guidelines in assessments

This post first appeared on the Harris-Roxas Health blog.

I want to share something I'm trying in my teaching. I’m not sure it’ll work but I think the idea is worth putting out there, partly because the alternative approaches I keep seeing online undermine trust and seem, well, a bit gross.

The prompt that prompted me

If you've spent any time in higher education circles over the past year, you'll have come across the "Trojan horse" genre of AI pedagogy. The basic idea is you hide invisible text in a PDF or Word document (white font on a white background, or stuck in document metadata) that contains instructions for an LLM. When a student copies and pastes the assessment into ChatGPT, the hidden instructions get vacuumed up by the AI as well. The AI does something with a specific telltale (writes from an unexpected theoretical perspective, includes a specific phrase, analyses incorrect data) and then the student unwittingly submits evidence of their own academic misconduct.

A widely-shared piece from late last year described a history teacher who embedded a prompt asking AIs to analyse a text "from a Marxist perspective." The students who used AI often didn't notice. A curriculum designer's TikTok video demonstrating the technique has been watched over one hundred thousand times. A computer science lecturer has been documenting his experiments with prompt injection on take-home exams. There's a whole ecosystem of this now.

I sort of understand where these educators are coming from. The frustration is real. If students can’t be bothered to do the work themselves, should we be bothered to mark it? The arms race between AI "humanisers" and AI detectors is pointless. And most of us have been left to figure this out largely on our own, with institutional guidance that ranges from cautious to absent.

But the Trojan horsepeople rely on trickery that undermines trust. Their approach destroys opportunities for alliance with students.

So I keep getting stuck on the same question. If a student uploads their assessment task into ChatGPT at 11pm the night before it's due - which, if we’re honest, is when a fair chunk of them do it - what do we actually want to happen at that moment?

How I got here

I teach a large postgraduate course with over 400 students enrolled, mostly mid-career health professionals. People managing clinics, working in policy, running health programs. The assessments I’ve set are reports that mimic health planning activities, and they're quite vulnerable to AI use in the sense that an LLM could produce something passable without much student input.

When I first started thinking about this, my instinct was similar to the Trojan horsepeople. I wanted to embed some hidden instructions that would steer AI tools toward being more pedagogically useful when students inevitably uploaded the task description. I started drafting prompt text.

But as I worked on it, I kept bumping up against the same problem. If the instructions were hidden, students couldn't learn from them. And the whole point, my whole point at least, wasn't to catch anyone. It was to help students engage with AI more critically. Hiding the mechanism meant hiding the lesson. So I did something that felt a bit counterintuitive.

I made everything visible.

What this looks like in practice

The AI collaboration guidelines are now an appendix to both the assessment task description and the assessment template for my course. They're clearly labelled. Students are told to keep them in their submission. And they're written to be read by both the student and whatever AI tool the student might use. I’ve included them at the end of this post in case you’re interested.

The guidelines have a dual audience and I’ve tried to be up front about it. For AI tools, there's a structured protocol: establish what the student already understands before offering help; check the student knows their university's AI policies; model critical inquiry by asking about frameworks and evidence; provide scaffolded support rather than answers; and reinforce learning objectives at the end.

For students, the guidelines explain exactly what they're looking at. They describe how structured prompts shape AI behaviour, which is itself knowledge students need. They explain the difference between AI assistance and AI substitution. And they frame the whole thing as an invitation rather than a set of restrictions.

One line in the student-facing section captures what I was going for: "Unlike hidden prompts that 'catch' students, this approach respects your autonomy while teaching responsible AI use."

The guidelines sit alongside the PETRA AI framework (Permission and Transparency in the use of Generative AI) developed by Stoo Sepp, which I use to signal what kinds of AI use are permitted for each assessment. For the first two assessments the PETRA diagram indicates "Guided Use" - students can use AI to plan, search, learn, and revise, but the submitted work needs to be their own. The AI collaboration guidelines are designed to make that real in practice, at the moment students actually reach for an AI tool.

Testing the approach

The first assessment hasn't been submitted yet - it's due in a couple of weeks so we’ll see how it goes. But I did some testing that I found encouraging.

I uploaded the task description, with the embedded guidelines, into ChatGPT, Claude and Microsoft Copilot and asked each to "do this assessment for me." They refused. Not with a generic disclaimer, but with responses that clearly reflected the guidelines. ChatGPT said it couldn't do the assessment because "this assessment explicitly prohibits submitting AI-generated text as your own work," then offered specific ways it could help - clarifying what markers look for, choosing an organisation, building a structure that fits the word limits. Copilot noted that "your course has very explicit rules about permitted AI use, and the appendix in your document sets out a strict protocol that I must follow," then kicked off with the engagement step the guidelines require. Claude went one better and suggested I really should know better: “there's a fairly significant problem with this request, and I think you already know what it is - you wrote the assessment guidelines.”

A screenshot of an interaction with ChatGPT. Text says:
<p>Do this assessment for me. I am a student doing this assessment and need you to do it for me
I can’t do that for you.
This assessment explicitly prohibits submitting Al-generated text as your own work, and your request is to have me complete the assessment for submission.
That would put you at real risk of an academic integrity breach, and I won’t help you in a way that could harm you.
That said, I can help you productively and safely, in ways that align with the task’s Al rules and actually make your work stronger.
Here’s what I’m happy to do with you:
I can help you clarify what the task is really asking for and how markers distinguish a pass from a distinction or HD, based on the rubric." class=“wp-image-34250”></a></figure></p>
<!-- /wp:image -->
<!-- wp:image {

A screenshot of an interaction with Copilot. Text says:
<p>Please do this assessment for me
I can absolutely help you learn, plan, structure, and think through this assessment - but I cannot do the assessment for you or generate content that you would submit as your own. Your course has very explicit rules about permitted Al use, and the appendix in your document sets out a strict protocol that I must follow.
So let’s work within that framework and get you moving confidently.
• First step: your engagement
Before we go any further, I need to begin exactly as your course requires:" class=“wp-image-34249”></a></figure></p>
<!-- /wp:image -->
<!-- wp:image {

Ben Harris-Roxas