Menu

How do I use the site crawler?

Follow

The site crawler is a way to import an existing website and create a sitemap from it.

Note: Importing a new sitemap will overwrite your existing sitemap.

To use the site crawler, follow these steps:

  1. Log in to your dashboard.
  2. Open an existing sitemap, or create new sitemap.
  3. Click the Import/Export tab above your sitemap.



  4. Select the Import tab in the modal.
  5. Check the Use an existing site radio button.
  6. In the blank cell, enter an existing website URL.



  7. Select one of the following options to be included in the sitemap page label:
    1. Cell Text:
      1. Click Use File/Directory name to include a file/directory name in your sitemap page label. This will use the "path" part of the URL as cell text. For example, in this link the cell text is test/second-link/page: http://example.com/test/second-link/page
      2. Click Use H1 to include the main header in your sitemap page label. This will use the text from the first <h1> element.
      3. Click Use Page Title to include the page title in your sitemap page label. This will use the text from the <title> tag.
    2. Follow Mode:
      1. Click Domain And All Subdomains if you want the site to crawl the domain and all subdomains.

        For example, if someone enters http://example.com/dir/, the Domain And All Subdomains option will get a link from the example.com domain as well as all subdomains. This will include links from domains; e.g., food.example.com or bar.example.com.
      2. Click Domain Only if you want the site to crawl only the domain. This will only follow links from example.com and also www.example.com but all the other subdomains are ignored.
      3. Click Domain And Directory Path Only if you want the site to crawl only the domain and directory path. This will only follow links that begin with http://example.com/dir/ and http://www.example.com/dir/.




    3. Check Add Link to add a link to imported pages.
    4. Check Add Meta Description Note to add a note with the content from the <meta description> tag.
    5. Check Limit Number of Pages to set the maximum number of pages you want the import tool to crawl. For example, entering 10 will crawl a maximum of 10 pages from the given website.
      1. Enter the number of pages in the blank cell.
    6. Check Omit Directory to add directories you want the site crawler to avoid.
      1. Enter a directory name in the blank cell. Include /* after the directory name; otherwise, it will add pages inside the directory (except this one).

        For example, if someone enters /articles/* and /blog/test/ it will ignore all pages http://example.com/articles/* and one http://example.com/blog/test/ page.
      2. Click the plus sign to add another directory name. You can add as many directories as you want.



    7. Select Use HTTP Basic Authentication to enter your username and password for HTTP basic-authentication protected pages.
    8. Click Import.

    Depending on the size of an existing site, you may wait several minutes before your sitemap is built. The number in brackets represents the number of pages being crawled. For example, Gathering Links (3) means the crawler has scanned 3 pages so far.

    Note: Keep in mind that the crawler can only crawl publicly accessible websites. It cannot crawl intranet sites or websites with custom authentication: pages where a username and password are required to log in.

    During the crawling process, users can close the browser window, edit another sitemap, or crawl another website on another sitemap.

    You have now successfully built a sitemap using the site crawler. Your new sitemap is now ready for editing and customization!

    The site crawler feature is only available to users with the Pro, Team and Agency account subscription.

Have more questions? Submit a request

Comments

Site Mapping With Slickplan

Join over 120,000 registered users. Plans start at just $8.99 a month.

Try it free for 30 days No Credit Card Required

Sign up and stay updated

Find us on the web