From APACHE 2.2 Site for Windows Security Implementation Guide
Part of WG310
Search engines are constantly at work on the Internet. Search engines are augmented by agents, often referred to as spiders or bots, which endeavor to capture and catalog web-site content. In turn, these search engines make the content they obtain and catalog available to any public web user.
Locate the Apache httpd.conf file. If unable to locate the file, perform a search of the system to find the location of the file. Open the httpd.conf file with an editor and search for the following uncommented directives: DocumentRoot & Alias Navigate to the location(s) specified in the Include statement(s), and review each file for the following uncommented directives: DocumentRoot & Alias At the top level of the directories identified after the enabled DocumentRoot & Alias directives, verify that a “robots.txt” file does not exist. If the file does exist, this is a finding.
Remove the robots.txt file from the web site. If there is information on the web site that needs protection from search engines and public view, then other methods must be used to safeguard the data.
Lavender hyperlinks in small type off to the right (of CSS
class id
, if you view the page source) point to
globally unique URIs for each document and item. Copy the
link location and paste anywhere you need to talk
unambiguously about these things.
You can obtain data about documents and items in other
formats. Simply provide an HTTP header Accept:
text/turtle
or
Accept: application/rdf+xml
.
Powered by sagemincer