Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
Featured Web Site Template

Hundreds More at Free Site Templates.com!

Web Site Partners
Sponsored Links
Jet City Software
 
Whos Here ?
Reflects user activity within the last 5 minutes
Moderator(s): yellowwing
Member Message

elbst23pitt
Joined: Mar 14, 2005
# Posts: 2

View the profile for elbst23pitt Send elbst23pitt a private message

Posted: 2005-Mar-15 07:58
Edit Message Delete Message Reply to this message

I recently got this message from my dedicated web host:

You'll find that Google puts about 300 hits a week in your logfile, Yahoo puts about 32,000 hits, and MSN puts about 120,000 hits on it. I'm using the samples below that I place in /tmp/yaho

Basically, the major engines are sending bots to review my sites and it is causing my internal hit tracker to be way off.

I want to stop these bots from coming into my sites, but will that negatively affect my search engine rankings? is there anyway for them to stop coming and have my rankings not be affected?

finally, how can i allow them to index my site without my internal tracker counting it as a visit.



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10438

View the profile for g1smd Send g1smd a private message

Posted: 2005-Mar-15 21:58
Edit Message Delete Message Reply to this message

If the bots can't visit your site then they will not index it.

If they don't index it, then they will not include you in their search results.


You should ban other bots that are email scrapers and any others with malicious intent.



elbst23pitt
Joined: Mar 14, 2005
# Posts: 2

View the profile for elbst23pitt Send elbst23pitt a private message

Posted: 2005-Mar-17 05:11
Edit Message Delete Message Reply to this message


how do I block bots with malicious intent?



g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10438

View the profile for g1smd Send g1smd a private message

Posted: 2005-Mar-18 20:07
Edit Message Delete Message Reply to this message

You need to add their user-agent and your suggested permissions to your robots.txt file in the root web folder of your site.



yellowwing
Joined: May 21, 2002
# Posts: 2526

View the profile for yellowwing Send yellowwing a private message

Posted: 2005-Mar-20 17:37
Edit Message Delete Message Reply to this message

Isn't there some kind of server code to indicate that the page content has not changed since the last visit?

That would cut down on the robot band width.



yellowwing
Joined: May 21, 2002
# Posts: 2526

View the profile for yellowwing Send yellowwing a private message

Posted: 2005-Mar-20 17:56
Edit Message Delete Message Reply to this message

I found this in the W3.org site.
"304 Not Modified
If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code"

Can you ask your hosting company to implement this?




g1smd
Staff
Joined: Jul 28, 2002
# Posts: 10438

View the profile for g1smd Send g1smd a private message

Posted: 2005-Mar-20 17:58
Edit Message Delete Message Reply to this message



"If Modified Since...."



Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391

View the profile for Dinkar Send Dinkar a private message

Posted: 2005-Mar-20 18:40
Edit Message Delete Message Reply to this message

If Yahoo and MSN are hitting too much then you can slow down them by using 'crawl-delay' in robots.txt

Example:


Code: [copy]





This will tell MSN to wait for 10 seconds before quering for next document.




Dinkar
Staff
Joined: Aug 12, 2001
# Posts: 4391

View the profile for Dinkar Send Dinkar a private message

Posted: 2005-Mar-20 18:49
Edit Message Delete Message Reply to this message

how do I block bots with malicious intent?


You have to use .htaccess file. I don't know much about it but have the following code:



Code: [copy]





Add the code in your .htaccess file and replace {ADD USER AGENT HERE} with the name of malicious user agent name. You need to repeat the code for every user agent.

Examples:

SetEnvIfNoCase User-Agent "indy library" keep_out
SetEnvIfNoCase User-Agent "missigua locator" keep_out
SetEnvIfNoCase User-Agent "FndLnk" keep_out



[ Message was edited by: Dinkar 03/20/2005 08:21 pm ]





jsrobinson
Joined: Dec 18, 2004
# Posts: 29

View the profile for jsrobinson Send jsrobinson a private message

Posted: 2005-Mar-22 01:01
Edit Message Delete Message Reply to this message

I think the problem needs to be looked at from a different perspective: why isn't the web log reporting tool taking the bots into account and removing their hits from the usage stats?

I specifically rewrote a significant portion of my web reporting tool specifically to do this, because I did not want to "limit" SE's access to sites I host/run. User-Agent is easily found in logs, and easily accessable from code (PHP/ASP) so this really should not be a huge technological issue for anyone (but then again, I don't know your situation...).


You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
  1. You have not yet logged in, or registered properly as a member
  2. You are a member, but no longer have posting rights.
  3. This is a private forum, for which you do not have permissions.

If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

If you cannot find this message, click here to Re-Send it.

If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

New posts Forum is locked
© 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions