AI Guidelines

With AI coming onto the scene there is much debate on how to best utilize it. This represents a living document of my current thoughts

How AI works

This is a critical piece of the puzzle to understand when leveraging AI. A good primer on LLMs can be found here. People often use LLM, entire AI systems, or product names interchangeably, this often adds to the confusion. When directly interacting with LLMs (Foundry Local, Ollama, etc), you will get a better idea of what the model is providing vs what the systems around the LLM are providing. A very simple example of this is the retrieval-augmented generation (RAG) pattern. The idea here is that the LLM's responses are grounded in knowledge to provide more accurate responses. It is not the LLM that has gotten better, it is the system leveraging the LLM that is now able to provide more accurate information.

AI Different Usages

There are two key areas of AI usage. Using AI-based tools (Copilot, Claude, Windsurf, etc), and building solutions with AI (leveraging AI services such as Microsoft Foundry, OpenAI apis, etc).

For the purposes of this document, I will use "using AI" to mean AI based tools and "building AI" to mean building solutions with AI.

Building AI

Responsible AI

When implementing AI solutions, development teams must take this into account. Microsoft has great guidance on this.

Using AI

Guidelines

It is a popular misconception that AI can do everything.

Avoid AI when learning is desired

AI can be a great asset for producing results, however, this can come at the cost of learning. If you are trying to learn a new skill, it is often best to avoid using AI until you have a good grasp of the fundamentals. This will allow you to better understand how to leverage AI when you do start using it. This does not mean you should avoid AI entirely, but rather use it in a way that complements your learning rather than replacing it.

This also applies to pull request review comments on both sides of the review. Reviewers should still perform a human review rather than outsourcing their judgment to AI, and authors should still read and think through feedback carefully before deciding whether AI can help draft a response or implementation. PR feedback is often one of the best opportunities to learn how your teammates think, what standards they are reinforcing, and where your understanding is still weak. If AI handles that loop for either side, it may save a little time while giving up the learning that the review was meant to provide. See How AI assistance affects coding skills.

You need to be better than the AI

Context is king. AI systems do not magically understand your codebase, business rules, constraints, or standards; they respond to the context they are given. If you are the person setting up or using the system, you are responsible for deciding what context to provide, what tools and permissions it has, and what boundaries it must respect. The quality of the result is often a direct reflection of the quality of that setup. This also means you must be capable of evaluating the output. If you cannot tell when the AI is wrong, overconfident, or missing critical context, then you are delegating judgment that still belongs to you.

Do not submit code for peer review that you have not reviewed

Peer reviews (often in the form of pull requests) are first and foremost a way to ensure knowledge transfer between multiple team members. It provides opportunity for people to provide feedback and ask questions. If you have not reviewed the code yourself, do not submit it for review from others. In addition to not valuing the reviewer's time, despite you being the author of the PR, you have actually denied yourself the knowledge of the code that you are submitting.

Correct, working, and productive are not synonyms

AI can generate huge amounts of content in a short period of time, but speed and green tests are not proof of value. "It works" is not the same as "it is correct," and "it was produced quickly" is not the same as "it improved productivity." Validation must occur, and that validation must be tied to the real outcome we care about. In the case of UI code, running it might be the best validation. In other cases tests may help, but test coverage and validation are not synonyms, especially if the tests were also AI generated. AI agents often focus on making tests pass, testing what is there rather than what is intended. Extra care must be taken to ensure that tests are focused on the business requirements, not simply all of the code paths.

We already understand that measuring developers by lines of code or number of pull requests is a poor proxy for real value; the same logic applies to AI. The SPACE framework is a useful reminder that productivity is multidimensional and should not be reduced to a single activity metric. Evaluate AI by defect rates, maintainability, delivery confidence, team throughput, and user outcomes, not by how much code it produced or how fast it opened a PR.

AI can not take blame

I have often repeated the phrase "he who takes credit also takes blame". This is a critical piece of the puzzle when using AI. If you are using AI to generate content, you are ultimately responsible for that content. If AI is allowed to autonomously build and ship code, the humans that put that process into place are to blame. This is a key detail, if the AI can not take blame, then it can not be held accountable for its actions. This means that the humans that are using AI must be responsible for the content that is generated and the actions that are taken as a result of that content. It also is unable to receive the credit that so many people are granting it.

Avoid buzz-word hype

It is popular right now to throw around every AI buzz-word in the book. As engineers we need to be exact in our speech and avoid using buzz-words that do not add value to the conversation. This is especially important when discussing AI, as there is already a lot of confusion around the topic. Using buzz-words can add to that confusion and make it harder for people to understand what is being discussed.

Tips and tricks

Plan then Apply

This paradigm is very similar to the ideas behind Test Driven Development (TDD). Spending time focusing on building up an accurate plan for what the AI should produce will lead to significantly better results.

Critique rather than create

For important writing tasks, I favor using AI to critique my writing rather than generate it from scratch. This preserves my tone and style while leveraging AI to provide insights into how to make it better. Often people just want to have AI do all of the work, however, pivoting to having it help rather than do can lead to much better results.

Don't rely on AI when static analysis can do it

As much as possible enforce coding practices with static analysis tools. Use the output of these tools as context for the AI so that it can generate code that aligns with the standards. In .NET, there are a plethora of analyzers that can be used to enforce coding standards.

FAQ

Is AI just another abstraction layer over code?

That idea sounds appealing, but it breaks down quickly. A compiler pipeline translates one formal representation into another. Programming languages have specifications, strict grammar, defined semantics, and deterministic transformations into lower-level representations.

Prompting an LLM is different. A prompt is natural language steering a probabilistic model. The model is not compiling your intent with perfect fidelity. It is inferring what you meant from context, filling in gaps, and generating the most plausible next tokens based on patterns in training data.

Some of the key differences are:

Formal syntax vs natural language: Source code is parsed against a language specification. Prompts are ambiguous and rely on interpretation.
Deterministic translation vs probabilistic generation: The same program compiles the same way. The same prompt can produce different answers.
Preserving authored logic vs inferring missing intent: A compiler lowers what you wrote. An AI system guesses what you meant and may invent unstated details.
Clear errors vs plausible failures: Compilers reject invalid programs. AI often returns output that looks reasonable even when it is wrong.
Single transformation vs full system behavior: Compilers are one part of a well-defined toolchain. AI output is shaped by the model, prompt, retrieved context, tools, and the system around it.

Treating AI as just another abstraction layer over code leads to overconfidence. It suggests that if you can describe the problem well enough, the output is simply a higher-level program being lowered into code. That is not what is happening. AI can misunderstand the request, invent unstated details, favor familiar patterns over correct ones, and produce different answers to the same prompt.

This distinction matters because it changes what the human is responsible for. When you write a program, the compiler preserves the program you authored. When you use AI, you are also responsible for whether the system interpreted your request correctly and whether the result is actually fit for purpose. AI is a powerful accelerator, but it is not merely a new abstraction layer over software development.