In today’s world, more and more developers are using AI tools like ChatGPT to write code.
This guide will help you understand the signs of AI-written code and how to check if code was created by ChatGPT or similar AI tools.
Here’s a simple prompt you can copy and use with ChatGPT when you want to check if some code was written by AI:
Prompt Template for AI Code Detection:
I need you to analyze this code snippet to determine if it was likely generated by an AI:
[Paste code here]
Please evaluate:
- Code structure and patterns
- Variable naming conventions
- Comment style and frequency
- Error handling approaches
- Overall consistency
Provide:
- Confidence level of AI generation (percentage)
- Specific indicators found
- Detailed explanation of reasoning
Common Signs of AI-Generated Code
1. Overly Generic Variable Names
AI models often use very basic variable names like ‘data’, ‘result’, or ‘temp’. Human developers typically choose more specific names that reflect what the variable actually contains. For example:
Human-written code might use:
monthly_sales_total = calculate_sales(january_data)
While AI-generated code might use:
result = calculate(data)
2. Consistent Formatting Patterns
ChatGPT tends to write code with very consistent formatting. While this might seem good, it can actually be a clue. Humans usually have small variations in their coding style, even when following style guides. AI-generated code often looks “too perfect” in its formatting.
3. Basic Error Handling
AI models usually implement very simple error handling. They often use basic try-catch blocks without specific error types or detailed error messages. Human developers typically include more detailed error handling based on their experience with real-world problems.
Example of AI-generated error handling:
try:
# do something
except:
print("An error occurred")
Example of human-written error handling:
try:
# do something
except FileNotFoundError:
logger.error("Config file missing at path: %s", config_path)
raise ConfigurationError("Missing required config file")
except PermissionError:
logger.error("No permission to access file: %s", config_path)
raise AccessDeniedError("Cannot access config file")
4. Standard Solutions
ChatGPT often provides the most common or standard solution to a problem, even when there might be better alternatives. It tends to use well-known design patterns and common library functions rather than creative or situation-specific solutions.
5. Limited Comments
AI-generated code usually includes either very basic comments or too many obvious comments. Human developers typically write comments that explain the “why” behind complex logic, while AI tends to comment on the “what” that is already clear from the code.
Example of AI-generated comments:
# Loop through the array
for i in range(len(array)):
# Add the current number to sum
sum += array[i]
Example of human-written comments:
# Skip first element to avoid counting the header row
# See ticket PROJ-123 for background
for i in range(1, len(array)):
sum += array[i]
Common Patterns in ChatGPT Code
1. Boilerplate Heavy
ChatGPT loves to include lots of boilerplate code. It often generates complete class structures and import statements, even for simple examples. This can make the code look more professional but also more standardized and less customized to specific needs.
2. Predictable Structure
AI-generated code usually follows very predictable patterns in how it structures functions and classes. For example, it might always put class variables in the same order or organize methods in a specific way.
3. Limited Use of Advanced Features
ChatGPT tends to stick to basic language features and avoid more complex or newer language features. This is because it was trained on older codebases and common patterns. For example, in Python, it might not use the latest features like walrus operators or pattern matching.
Technical Detection Methods
Advanced detection systems use perplexity measurements to identify AI-authored code. Lower perplexity scores often indicate AI generation, as AI models tend to produce more predictable code patterns.
The effectiveness of detection varies with code length. Studies show that detection accuracy improves with longer code samples, typically requiring at least 100 tokens for reliable analysis.
Best Practices for Detection
1. Comprehensive Analysis
Examine multiple aspects of the code:
- Pattern recognition
- Syntax analysis
- Logic errors and inconsistencies
- Documentation style
2. Context Consideration
Consider the programming language and project context when analyzing code, as detection accuracy can vary across different languages and frameworks.
How to Test if Code is AI-Generated
1. Check for Documentation
Look at how the code is documented. AI-generated documentation often includes:
- Very generic descriptions
- Missing real-world context
- No references to business logic or specific use cases
- Perfect but shallow formatting
2. Look for Complexity Patterns
AI-generated code usually shows these patterns:
- Over-simplified solutions for complex problems
- Missing edge case handling
- Basic security considerations
- Standard library usage instead of specialized solutions
3. Test Error Cases
Try breaking the code by:
- Providing unexpected input
- Testing edge cases
- Checking error handling
- Looking at how it handles null or undefined values
AI-generated code often fails in unexpected ways when dealing with these situations.
Best Practices When Using AI-Generated Code
1. Always Review and Modify
When working with AI-generated code, never use it without checking it first. Take time to read through the code and understand what it does.
Start by changing those basic variable names to ones that make sense for your project. While reviewing, look at how the code handles errors and improve it where needed – add clear error messages that will help you track down problems later.
Good comments are essential – write ones that explain why the code works the way it does, not just what it does.
Remember to test the code with different kinds of input to make sure it works in all situations. Most importantly, adapt the code to fit your specific business needs, since AI doesn’t know your exact situation.
2. Use AI as a Starting Point
Think of AI-generated code as a rough draft or outline. It gives you a good place to start, but you need to build on it.
The code from AI tools often has very basic error handling, so you’ll need to improve it by adding specific error checks and helpful error messages.
Take time to look carefully at security issues and add extra protection where needed. Remember that AI gives you general-purpose code, but your project needs specific solutions.
Add proper logging so you can track what’s happening when the code runs in the real world, and set up ways to monitor if everything is working correctly.
3. Document AI Usage
Good documentation is key when working with AI-generated code. Add comments at the start of AI-generated sections to mark them clearly.
When you make changes, write down what you changed and why – this helps other developers (or yourself in the future) understand the code better.
Keep track of which parts were written by humans and which came from AI. This makes it much easier to maintain the code later.
Write down how you tested the code and what results you got, so everyone understands what the code does and how reliable it is.
Remember that AI is a tool to help you write code faster, but it shouldn’t replace careful review and customization for your specific needs. Good documentation and testing are just as important with AI-generated code as they are with human-written code.
By following these practices, you’ll get the best results from using AI while maintaining high-quality, reliable code.