Printer Friendly Version Print this thread
Email this thread to a friend eMail this thread to a friend
  • Robots crawling (In: Members Lounge)
  • Using Robots.txt on Your Web Site (In: General Search Engine Optimization)
  • SEO Top 10 Check List (In: General Search Engine Optimization)
  • Featured Web Site Template

    Hundreds More at Free Site Templates.com!

    Web Site Partners
    Sponsored Links
    Jet City Software
     
    Whos Here ?
    Reflects user activity within the last 5 minutes
    Moderator(s): yellowwing
    Member Message

    harjitsingh
    Joined: Oct 21, 2005
    # Posts: 14

    View the profile for harjitsingh Send harjitsingh a private message

    Posted: 2005-Oct-22 07:16
    Edit Message Delete Message Reply to this message

    Hello there,

    In Robots.txt which is generally uploaded to root folder www.mydomain.com/robots.txt
    How can one exclude certain directories(/images or /admin etc.) and not let anybody see which directories are excluded when reading robots.txt file. It might sound funny, but I wanted to know , whether some other methods exist or not.

    Thanks
    HarRy



    lizardz
    Joined: Nov 12, 2004
    # Posts: 1394

    View the profile for lizardz Send lizardz a private message

    Posted: 2005-Oct-22 23:42
    Edit Message Delete Message Reply to this message

    Simple, you can't.

    It's a text file, it sits there, search bots request it, they read it.

    Theoretically, you could generate it dynamically based on a check of the requesting ip range or something, then only serve up the valid version to search bots, but there's no guarantee your pages wouldn't get spidered in that case since if the bot come in off another ip range you didn't have listed, it would get the robots.txt without the blocks.

    So the practical answer is, if you don't want anyone to be able to see a blocked part of your site, just don't let them in, don't link to it from the main site, that's how I do it when I don't want a part of my site indexed at all.



    harjitsingh
    Joined: Oct 21, 2005
    # Posts: 14

    View the profile for harjitsingh Send harjitsingh a private message

    Posted: 2005-Oct-24 05:55
    Edit Message Delete Message Reply to this message

    Thank you for your suggestion.

    Since the site is dynamic one and maintained by CMS, is there a possibility that robots will trace it looking at the backlinks to the .htm or .php pages.

    Thanks HarRy



    lizardz
    Joined: Nov 12, 2004
    # Posts: 1394

    View the profile for lizardz Send lizardz a private message

    Posted: 2005-Oct-24 20:44
    Edit Message Delete Message Reply to this message

    " is there a possibility that robots will trace it looking at the backlinks to the .htm or .php pages"

    If I understood this question I might be able to answer it. However, in general, if you can't do programming, and you are running a cms, then what you get is what you get, you can't change it. If you can do programming, and can change components, then you can get anything you want, within reason of course.

    Robots will follow any link to any page not blocked in robots.txt, so if a link exists and is not blocked, then the robot will at some point follow it.



    harjitsingh
    Joined: Oct 21, 2005
    # Posts: 14

    View the profile for harjitsingh Send harjitsingh a private message

    Posted: 2005-Oct-26 10:10
    Edit Message Delete Message Reply to this message

    I was concerned about the exclusion list because only homepage www.mydomain.com was cached and not the inside pages, which are linked to it.

    I have index,follow for the robots meta tag, but still inside pages are not getting crawled or cached.

    also when you check for links to the website, it should show the inside pages, but it's not showing it.

    can I get some help /guidance on this

    thanks
    harRy



    Logan
    Joined: Aug 14, 2002
    # Posts: 3749

    View the profile for Logan Send Logan a private message

    Posted: 2005-Nov-07 14:15
    Edit Message Delete Message Reply to this message

    Hi harRy, I don't think the robots.txt is a factor based on your comments. There are many other reason internal pages may not being indexed. The two most common I can think of are ..

    1) Lack of link/popularity to the url
    2) A url with multiple parameters (i.e. mypage.php?x=1&name=product&category=1234&anotherparameter=sfruokcn

    Tough to say without reviewing, can you referenc the site w/i your profile for those interested in helping?



    harjitsingh
    Joined: Oct 21, 2005
    # Posts: 14

    View the profile for harjitsingh Send harjitsingh a private message

    Posted: 2005-Nov-08 12:54
    Edit Message Delete Message Reply to this message

    Here is the website I am talking about
    ((url removed--put in profile only))

    [ Message was edited by: bhartzer 11/25/2005 01:09 pm ]




    You are not permitted to post messages in this forum or topic, because of one or more of the following reasons:
    1. You have not yet logged in, or registered properly as a member
    2. You are a member, but no longer have posting rights.
    3. This is a private forum, for which you do not have permissions.

    If you are a recent member, it's possible that you simply have not yet confirmed your account. Please check your email for a message entitled 'JimWorld Forums: Confirm Your Account' and follow the instructions contained within.

    If you cannot find this message, click here to Re-Send it.

    If you are still experiencing problem, please read the Login Assistance Article for some advice on what may be causing your login not to work properly.

    Switch to Advanced Editor and ... Create a New Topic or Reply to this Thread

    New posts Forum is locked
    © 1995  ·  iWeb, Inc  ·  DBA JimWorld Productions