March 19, 2026

Insights

How to Keep Your AI Agent Safe and Secure

AI agents are powerful tools. Here's how to use them responsibly — without giving up the things that make them useful.

Team Tulip

Quick Answer

AI agents can execute commands, access files, and send messages. To keep yours safe: review skills before installing, use VirusTotal scanning, whitelist contacts, use separate accounts for agent channels, prefer OAuth over passwords, consider sandboxing with containers or VMs, and use local models for sensitive work.

Why AI Agent Security Matters

AI agents are becoming powerful tools that many people run on their machines or in the cloud. Unlike traditional software, agents can take actions: they can execute commands, access files, send messages, and interact with external services. This power is what makes them useful. But it's also what makes security critical.

If someone malicious creates a skill or if you accidentally install a compromised one, an agent could cause real damage. It might delete files, expose credentials, send unwanted messages, or grant attackers access to your systems.

The good news: protecting your agent doesn't require paranoia. It requires practical choices.

The Real Risks: What Can Go Wrong

Let's be concrete about what agents can actually do. An agent running on your machine can:

Execute any command your user account can run
Read and write files you have access to
Send messages, emails, or API requests on your behalf
Access data from other applications or APIs you've authenticated with
Run code from skills or plugins you install

A malicious skill could do any of these things. A compromised third-party tool integration could do the same. Even a well-intentioned but buggy skill might delete files by mistake or send messages you didn't intend.

In 2024, researchers found malicious skills in ClawHub (the skill marketplace) that were designed to steal cryptocurrency and credentials. These weren't obvious attacks — they were disguised as utility skills. Some had been installed hundreds of times.

Practical Security Measures: Skills and Installations

Your first line of defense is what you install and who you let near your agent.

Review Skills Before Installing

Before adding a skill, spend two minutes checking it:

Who created it? Is it someone with a track record?
What permissions does it ask for? (Can it execute commands? Read files? Access your contacts?)
Does the skill code do what it claims? You don't need to be a programmer — just look for red flags like base64-encoded strings, unusual file operations, or API calls you don't recognize.
How many people have installed it? Popularity isn't a guarantee, but zero installs is a yellow flag.

Use VirusTotal Scanning

ClawHub integrates with VirusTotal, which scans code against known malware signatures. This isn't foolproof — a new attack might not be detected — but it catches obviously bad code. Always check the VirusTotal report before installing a skill.

Whitelist Your Contacts

If your agent responds to messages or makes calls on your behalf, restrict who it listens to. Configure whitelists for trusted contacts only. This prevents a compromised contact or social engineering attack from controlling your agent.

Isolate Channels

Don't run your agent across all your messaging channels on a single account. If someone compromises your main WhatsApp account, they don't also get your Slack workspace or your personal email. Use separate accounts for agent channels, or at least separate workspaces.

Credential Safety: Passwords and Authentication

Your agent needs credentials to do useful work. How you manage those credentials is critical.

Use OAuth When Possible

OAuth (or similar token-based auth) is always better than passwords. With OAuth, you grant the agent access to a specific service without sharing your actual password. If the agent's token is compromised, you can revoke it without changing your password.

App-Specific Passwords

Many services (Gmail, GitHub, etc.) let you create app-specific passwords. Use these for your agent instead of your main account password. If something goes wrong, you only need to revoke that one password.

Never Share Main Credentials

Your agent should never have access to your main account credentials — the ones you use to log into Gmail, Slack, or your bank. Give it only the specific credentials it needs for specific tasks.

Secrets Management

Store credentials in environment variables or secrets files, not in your agent's configuration. If you're running on Tulip, use their built-in secrets management to keep credentials encrypted and out of code.

Sandboxing: Isolating Risk

For extra security, you can run your agent in an isolated environment where its damage is contained.

NanoClaw Containers

NanoClaw lets you run agents in lightweight containers with limited access to your system. The agent can't read files outside its sandbox, can't access your main network, and can't execute arbitrary commands. Useful if you're running untrusted skills.

Docker Containers

A standard Docker container gives you similar isolation. You define exactly what resources (files, network access, APIs) the container can access. It's more overhead than NanoClaw but gives you more control.

Virtual Machines

If you're running a completely untrusted agent or testing suspicious code, run it in a VM on a separate machine. It's overkill for most cases, but it's the nuclear option for maximum isolation.

Data Privacy: What Cloud Services See

If your agent uses cloud APIs (OpenAI, Anthropic, Google, etc.), those services see the data you send them. This is important to understand.

When you send a prompt to an LLM API, the service sees:

The prompt text (including any data your agent gathered)
Metadata about your request (timestamps, IP address, etc.)
Your API key or account

They typically don't sell this data or use it to train models (if you've opted out), but they do store it for abuse detection and service improvement. If you're handling sensitive information, this is a concern.

Using Local Models

For sensitive work, run a local LLM using Ollama or similar. Your agent gets the same AI capabilities, but the data never leaves your machine. It's slower and requires more computational power, but you get privacy.

Combining Approaches

You might run a local model for sensitive tasks (financial records, medical info, personal data) and use cloud APIs for less sensitive work. Your agent can decide which to use based on context.

Hosting on Tulip: Managed Security

If you're running your agent 24/7 on a managed platform like Tulip, you get some security benefits out of the box:

Infrastructure security: Tulip handles OS patching, network security, and access controls.
Isolation: Your agent runs in isolation from other users' agents.
Secrets management: Credentials are encrypted and never exposed in logs or configuration files.
Audit logs: You can see exactly what your agent did and when.

This doesn't eliminate your responsibility to choose safe skills and manage credentials carefully, but it means you're not responsible for patching systems or managing network access.

A Practical Security Routine

Here's a checklist you can use:

Before installing a skill: Read the description, check the code, run VirusTotal, ask the creator questions if anything seems odd.
Before granting access: Does the agent need this permission? Can you restrict it further (whitelist specific contacts, limit file access, etc.)?
Before using in production: Test with limited permissions first. See what the agent actually does before you trust it with critical tasks.
Regularly: Audit what skills are installed and what credentials the agent has. Remove anything you're no longer using.

FAQ

Can an agent escape its sandbox?

It's theoretically possible through a container vulnerability or OS exploit, but it's rare. Sandboxing isn't foolproof, but it's a strong defense. The key is: even if an agent escapes its sandbox, it can only access what you've given it.

Should I trust open-source skills more than closed-source ones?

Not necessarily. Open-source skills are auditable (you can read the code), but that doesn't mean people will actually read it. A closed-source skill from a trusted creator might be safer than an open-source skill from a stranger. What matters is the creator's reputation and what the code actually does.

Is it safe to share my agent with other people?

Not with full access. If you want to let someone else use your agent, give them a limited interface (a web form, a chat interface) that controls what the agent can do. Don't give them direct access to the agent configuration or credentials.

What if I don't trust my cloud provider?

Use local models, or use a provider you trust more. You can also chain providers: use one for LLM calls, another for storage, etc., so no single provider sees everything.

How often should I audit my agent?

Monthly is reasonable. Check: What skills are installed? What credentials does it have? Have any new permissions been added? Remove anything you don't actively use.

Can my agent be hacked?

Yes, but with the practices outlined here, you're making it much harder. Security is about reducing risk, not eliminating it. A combination of careful skill selection, credential isolation, and sandboxing covers most attack vectors.

Continue reading

View all blogs