There are several ways to exclude your content from the search engine.
Robots.txt is by far the most reliable and easiest way to exclude parts of your site from the search engine index. In brief, robots.txt is used to tell search engines not to index content at specific URLs. The format for robots.txt is very simple and easy to follow. If a file named robots.txt is present in the top folder of your site, search engines will read this file for information on what sites to exclude from their crawl.
An excellent resource for this technique is robotstxt.org.
If you're interested in keeping just the Northwestern search engine out of your content for whatever reason, you can target the user agent "Northwestern-Search", e.g.
User-agent: Northwestern-Search Disallow: /
#2 Meta Directives
Several meta tag directives exist to communicate to search engines that they shouldn't index web content. The most commonly used one is the ROBOTS directive. The following snippet in the <head> portion of your HTML documents will instruct engines not to index your content:
<meta name="robots" content="noindex,follow">
If you have sensitive content that you may prefer not be cached by the search appliance, you can use the following Google-specific tag to instruct the appliance not to archive a copy:
<meta name="robots" content="none">
#3 Web Server Headers
In some cases, it may not be possible to include the above <meta> tag in your web content. This could be because:
- The time investment is too great (your site doesn't use templates)
- Some of your content is in binary format (.pdf, .doc, .xls) and not HTML
Instead of using the <meta> directive, you can send the same information in the HTTP header that your web server sends along with all requests.
If you don't want your content to appear at all you can send:
If you'd just like to prevent cached copies from appearing in search results, send:
#4 Request to Web Communications
You may request that specific resources be blocked by Web Communications staff. Please understand that you may not receive an immediate response. The contact address for this service is firstname.lastname@example.org. Please specify the URL pattern you'd like blocked when making your request and be as specific as possible.