How do I use the site crawler?

The site crawler is a way to import an existing website and create a sitemap from it.

Note: Importing a new sitemap will overwrite your existing sitemap.

To use the site crawler, follow these steps:

  1. Log in to your dashboard.
  2. Open an existing sitemap, or create new sitemap.
  3. Click the Import/Export tab above your sitemap.



  4. Select the Import tab in the modal.
  5. Check the Use an existing site radio button.
  6. In the blank field, enter your existing website's URL.



  7. Select one of the following options:
    1. Cell Text:
      1. Click Use File/Directory name to display a file/directory name in your sitemap page label. This will use the "path" part of the URL as cell text.
        For example, in this link the cell text is test/second-link/page: http://example.com/test/second-link/page
      2. Click Use H1 to include the main header in your sitemap page label. This will use the text from the first <h1> element.
      3. Click Use Page Title to include the page title in your sitemap page label.
        This will use the text from the <title> tag.
      4. Exlude Common Tex Pages On Import - this option will remove common recurring text from imported titles that recur throughout your website. For example, you can remove repetitive SEO text strings like "| Company Name" that proceeds or follows the title text.

        site_crawl_1.png
    2. Follow Mode:
      1. Click Domain And All Subdomains if you want the site to crawl the domain and all subdomains.
        For example, if you enter http://example.com/, our Site Crawler will fetch pages from the example.com domain as well as all subdomains, e.g. foo.example.com or bar.example.com.
      2. Select Domain Only if you would like our Site Crawler to follow links only from the specified domain, e.g. example.com (and also www.example.com).
      3. Click Domain And Directory Path Only if you would like to restrict the access to specific domain and the directory path only. If you enter http://www.example.com/dir/ this will only follow links beginning with http://example.com/dir/ and http://www.example.com/dir/.
      4. Don't Follow Query String Variables - will exclude links containing ? and & characters, for example: http://example.com/link?param=1&page=2
        This option is helpful if you have many pagination pages or dynamically generated calendar.

      sajtkrauler.png

    3. Check Add Link to add an URL to imported pages.
    4. Check Add Meta Description Note to add a note with the content from the <meta description> tag.
    5. Select Limit Number of Pages to set the maximum number of pages you want the import tool to fetch. For example, entering 10 will download a maximum of 10 pages from the given website.
      1. Enter the number of pages in the number field.
    6. Check Omit Directory to add directories you want the site crawler to avoid.
      1. Enter a directory name in the blank field. Include /* after the directory name to exclude all subdirectories.

        For example, if you enter /articles/* and /blog/test/ our Site Crawler will ignore the http://example.com/articles/* page and all subdirectories and also the http://example.com/blog/test/ page.
      2. Click the plus sign to add another directory name. You can add as many directories as you want.

    7. Check the Import meta data to Content Planner to import SEO details (metadatas and URL slug) about each page to Content Planner tool.
    8. If your website is password protected select Use HTTP Basic Authentication and enter your username and password (Note: currently it works only with the HTTP Basic Auth method).
    9. Click Import.

Depending on the size of an existing site, you may wait several minutes before your sitemap is built. The number in brackets represents the number of pages already downloaded. For example, Gathering Links (3) means the crawler has scanned 3 pages so far.

If you don't see any progress for a few minutes please click the Cancel button and try again. You can also stop the crawler process manually - just click the Stop & Save button, it will generate a sitemap just from the successfully scanned pages.

Note: Keep in mind that the crawler can only crawl publicly accessible websites. It cannot crawl intranet sites or websites with custom authentication: pages where a username and password are required to log in.

During the crawling process you can close the browser window, edit another sitemap, or crawl another website on another sitemap.

You have now successfully built a sitemap using the site crawler. Your new sitemap is now ready for editing and customization!

The site crawler feature is only available to users with the Pro, Team and Agency account subscription.

Note: For security reasons site crawler is limited to 5000 pages (you can fetch up to 5000 pages during one crawling process).

Have more questions? Submit a request

Comments

Join over 180,000 registered users

plans start at just $8.99 a month

Get Started Today

No credit card required