What is search engine spamming?
Search engine spam is the use of unethical techniques for improving the position of a web site in a search engine. Some web site owners use unethical spamming techniques to manipulate their positions in search engine rankings, and thereby try to fool the search engines. The objective of each search engine is to produce relevant results.
Each search engine’s objective is to produce relevant results. This is because, producing the most relevant results of any particular search query is the determining factor of being a successful search engine. Every search engine measures relevancy according to its own algorithm, thereby producing different set of results. A search engine spam occurs when an attempt is made to artificially influence a search engine’s basis of calculating relevancy.
Classification of search engine spam techniques
Search engine spam techniques can be classified under the following categories:
1. Content Spam
With the help of this spam technique, only the search engines but not the surfers, can view some part of the data in a web resource (e.g. HTML document).
Some commonly used content spam techniques are as follows:
a) Invisible text
Using fonts that are the same or similar color as the background, is one of the most common search engine spam techniques to hide keywords. This can be done by using tables or a background with a different color than the real background for the site.
b) Keyword stuffing
Another very popular search engine spam trick, used along with hidden text, is the repetition of keywords on the bottom of the page in very small fonts.
2. Meta spam
In order to manipulate the relevancy calculations of search engines, Meta data (which actually describes a web resource) describes a web resource inaccurately or incoherently.
Following are the common Meta spam techniques:
a) Unrelated keywords
In order to fool surfers, it has become a common technique to use popular keywords that do not apply to the site’s content. For the time being, one might be able to trick a few people searching for such words into clicking at the link, but they will soon leave the site when they do not get any relevant information on the topic they had originally searched for. This kind of search engine spam upsets both the search engines and their users.
b) Hidden tags
The use of keywords in hidden HTML tags like title tags, comment tags, Meta description tags, style tags, http-equiv tags, hidden value tags, alt tags, font tags, author tags, option tags, and no frames tags (on sites not using frames), are considered to be search engine spam by some search engines while others may allow it.
3. Identical pages
Duplicate web page (or doorway page) are considered as search engine spam by all search engines and directories. It is not advisable to give the copies different file names, and submit them all (mirror pages). This will be interpreted as an attempt to flood the engine.
4. Code swapping (“bait & switch” technique)
This means optimizing a page for high search engine position, and then swapping another page in its place once a top rank is achieved. This technique does not lead to a long-lasting search engine placement.
5. Page redirects
Redirecting is defined as taking a searcher from a page designed only for a search engine, to see to a page designed for the searchers by using META refresh tags, CGI scripts, Java, JavaScript, or server side techniques. This is considered as spam.
Spam filled web pages are intended for search engines only. When a user visits these pages, they are redirected to the real page. Generally, search engines do not like pages that take the user to another page without his or her intervention.
6. Link farms
Link spamming is the technique of artificially increasing the link popularity of a web site in order to influence its ranking in the search engines.
The common factors of link popularity are
- the number of inbound links a web site has,
- the link popularity of the sites linking to that web site,
- the context of the sites which are linking to the web site, and
- the similarity with the linking site.
The last three factors are difficult to influence, but web masters still try link-spamming techniques.
Some common techniques are described below.
Posting messages to various message boards and guest books is a very common practice that webmasters implement. They visit hundreds of message boards and post messages with links to their sites. However, search engines are sophisticated software and are easily able to detect this sort of spam.
Many messages are posted every day in guest books and message boards. This means that apart from linking to your site, they are also linked to numerous other web sites, and the number of their outbound links is numerous. However, this reduces the importance of these links. If there is a web site that has got only two outbound links and it is linking to your site too, that link is more important as compared to a link from a site, which has got 200 outbound links.
Though the definition of link popularity says that it is the number of web sites linking to a web site, yet it is not the number but the quality of the links that matters. Quality of the links means the context of the links as well the link popularity of the sites, which are linking to a particular web site. However, it is very difficult to artificially get quality links.
Another common method by which spammers try to influence the link popularity of web sites is by joining link farms. A link farm is a network of web pages, which are heavily cross-linked with each other. These web pages may be present may be in more than one domain or in more than one server. In order to influence the link popularity, when a web site joins such link farms, it gets a link from each of these pages, and in turn, it also has to link back to each of those pages. But search engines can detect the link farms as well as the web sites participating in the link farms very easily.
7. No content
Search engines will also consider the following matter as spam – if a site does not having any unique content of value to offer to the visitors. In addition, illegal content, duplicated content, and sites consisting largely of affiliate links are also regarded as low value search engine spam, mainly by the directories.
8. Over submitting
Each search engine has its own limit on how many pages one can manually submit to it by using its online form and how often it can be submitted. Submitting the same page more than once a month to the same search engine and submitting too many pages each day is not allowed.
Currently the limits are as follows: AltaVista 1-10 pages per day; HotBot 50 pages per day; Excite 25 pages per week; Infoseek 50 pages per day but unlimited when using e-mail submissions. It is to be noted that this is not the total number of pages that can be indexed; it is just the total number of pages that can be submitted.
For example, if one can only submit 25 pages to Excite and he has a 1000 page site, that’s no problem. The search engine itself will come to the site and index all pages, including those that have not been submitted.
9. Agent-Based
Agent-Based Delivery, invented almost at the same time as the web itself, can be challenged to deliver spam to search engines. However, as it does not depend on the existence of search engines, its use does not indicate an intention to spam a search engine.
10. Cloaking
Cloaking is defined as a technique used to display different pages to the search engine spiders other than the ones normal searchers get to see. Although there are legitimate reasons for cloaking, most search engines, for example AltaVista, Google and Inktomi, consider this as spam. Web sites are penalized if it is found that they identify a search engine spider by IP name or address and deliver unique content to these spiders.
11. E-mail spamming
E-mail spamming means getting commercial messages in the mail box from unwanted and unknown sources. Among these messages, some are chain emails, some are get-rich scheme messages and some contain adult related contents.
Although receiving the above two types of mails are really annoying, but the real problem lies in receiving pornographic mails. Usually, these types of mails are sent in millions to un-targeted email address. And it becomes a serious problem when minors receive pornographic messages in their mail boxes.
There are various ways of collecting email addresses. Among them the most easiest way is to collect these email addresses from newsgroups. Newsgroups are a great source of information. And spammers collect email addresses out of the posted articles in the newsgroups with the help of a special software.
Penalties for search engine spamming
Not all search engines are equally strict about spam. That is why there is an inappropriate amount of fear over the penalties of spamming. Tricks that are perfectly acceptable for one search engine can be considered as spam by another.
In some cases, some engines refuse to index pages believed to contain spam, while others index pages, but rank the pages lower. Another option to the search engine is banning the whole site.
Many web masters fear that they may spam the engines without their knowledge and then have their entire site banned from the engines forever. However, this just doesn’t happen that easily! The people who run the search engines know that one can be a perfectly legitimate and honest web site owner who, because of the nature of his web site, has pages that appear to be spam to the search engine.
Search engines know that it is difficult to get to know exactly who is spamming and who happens to be in the spam zone by mistake. Therefore, they do not generally ban the entire site from their search engine just because some of the pages of a particular site look like spam. They only penalize the rankings of the offending pages. Any non-offending page is not penalized. Only in the most extreme cases, will a search engine ban an entire site.
For example, if a web site spams aggressively and goes against the search engines’ recommendations and floods their engine with spam pages, then it will receive a backlash in the form of an entire web site ban. Some engines, like HotBot, do not have a lifetime ban policy on spammers. As long as one is not an intentional and aggressive spammer, one should not have to worry about the entire site being penalized or banned from the engines. Only the offending pages will be penalized in ranking.
Also, on some search engines, those some pages use some kind of search engine spam techniques, they are ranked high. Actually, these are old pages – usually some are several years old. Had these pages been submitted today, they would have been scored low, or rejected.
Don’ts for the search engines
Search engines strive to provide the most relevant results to their users, but spam clutters their index with irrelevant and misleading information. Therefore, it is advisable to make no mistake. Search engines will always react to spam techniques when they become a big enough problem. Then they might ban the entire site if one is caught using such tricks.
The following list will give you an idea of the “DONTS” for the search engines:
- Do not use text that is the same or slightly different color as the background to ‘hide’ words. Also, note the background color to a table cell and make sure that any text put inside the cell is not also the same color as the page background color.
- Do not repeat the keywords in the Meta tags more than once, and do not use keywords that are unrelated to the site’s content.
- Do not create a title like soft toys, soft toys, and soft toys. etc. This is considered as spam.
- Do not repeat the keyword to increase its frequency on a page (Keyword stuffing). Search engines now have the ability to analyze a page and determine whether the frequency is above a “normal” level in proportion to the rest of the words in the document.
- Do not optimize a page for top ranking, and then swap another page in its place once a top ranking is achieved.
- Do not put misleading words on the page in the hope of attracting visitors who are for some other topic.
- Do not submit a page to the search engines that, once loaded, automatically redirects to a page of different content.
- Do not create a page that prohibits the user from using the browser’s back button to return to the search engine results.
- Do not create too many doorways with very similar keywords.
- Do not submit the same page more than once on the same day to the same search engine.
- Do not submit pages that contain keyword filled ‘sentences’ that make no sense.
- Do not put multiple instances of the Title Tag in the HTML code. For a while, spammers had been able to enjoy success with this method, but the search engines quickly caught up with them, and this is now considered spam.
- Do not put pages of content in layers and position them off-screen or practice the same kind of behavior by turning the visibility of the layers to ‘off.’
- Do not use tiny or ‘invisible text’ in the page.
- Do not send query to a search engine with an automated ‘rank reporting tool’ hundreds of times per day.
- Do not purchase multiple domains and put duplicate copies of the web site on each domain. Search engines can detect duplicates and may penalize the entire block.
- Do not participate in Link Popularity Programs or Farms (this means participating in link exchanges for the sole purpose of increasing the ranking of web site in search engines).
- Do not submit different versions of the same web site (i.e. do not simply duplicate a Web page, give the copies different file names, and submit them all) in the hope of getting multiple listings. That will be interpreted as an attempt to flood the engine.
- Do not submit more than the allowed number of pages per engine per day or week. Each engine has a limit on how many pages one can manually submit to it by using its online forms.
- Do not cloak.
- Do not support affiliate sites with same or similar content but a different site design.
- Never submit doorways to directories. Follow the submission guidelines carefully.
Spam guidelines
Google takes all spam attempts seriously. The techniques that Google considers as spam are:
i) Hidden text
This means there should not be any text or links that can be seen by search engines but not by visitors to the site.
ii) Stuffing of pages with irrelevant words.
iii) Doorway pages
Generating doorway pages, multiple domains or sub domains with essentially the same content. It is better not to use programs that generate many generic doorway pages.
iv) Tricky redirects.
v) Cloaking
vi) Typo spam/cyber squatting (for example www.yahhoo.com)
vii) Identity theft/page jacking.
viii) Participating in link exchanges.
It is considered spam if any site participates in link exchanges for the sole purpose of increasing its ranking in search engines.
ix) Sending automated queries to Google in an attempt to monitor one’s site’s ranking.
Inktomi
Inktomi defines spam as an inappropriate use of Inktomi’s search engine involving any effort to deceive the search engine into returning a result that is unrelated to the query or whose position has been artificially inflated in the result set. Inktomi’s policies are designed to ensure that the practitioners of spam techniques do not degrade the search user experience in any way.
Some of the common spam practices according to Inktomi are:
i) Embedding deceptive and/or hidden text in the body of web documents (not related to actual content).
ii) Creating Meta data that does not accurately describe the content of web documents.
iii) Fabricating URLs that redirect for no legitimate purpose.
iv) The misuse of third party affiliate or referral programs.
v) Creating web documents with intentionally misleading links.
vi) Cloaking/doorway pages that feed Inktomi crawlers content that is not reflective of the actual page.
vii) Creating inbound links for the sole purpose of boosting the popularity score of the URL (i.e. Link farming or link spamming)
viii) Flooding the search results with machine-generated pages.