How do I use the site crawler?

The site crawler is a way to import an existing website and create a sitemap from it.

Note: Importing a new sitemap will overwrite your existing sitemap.

To use the site crawler, follow these steps:

  1. Log in to your dashboard.
  2. Open an existing sitemap, or create new sitemap.
  3. Click the Import/Export tab above your sitemap.



  4. Select the Import tab in the modal.
  5. Check the Use an existing site radio button.
  6. In the blank cell, enter an existing website URL.



  7. Select one of the following options to be included in the sitemap page label:
    1. Cell Text:
      1. Click Use File/Directory name to include a file/directory name in your sitemap page label. This will use the "path" part of the URL as cell text. For example, in this link the cell text is test/second-link/page: http://example.com/test/second-link/page
      2. Click Use H1 to include the main header in your sitemap page label. This will use the text from the first <h1> element.
      3. Click Use Page Title to include the page title in your sitemap page label. This will use the text from the <title> tag.
      4. Exlude Common Tex Pages On Import  - If you choose to use the file name or title tag for your Page Cell text this option will remove common recurring text from title tags and file names that recur throughout your website. For example, if you are using the file name enter .php in this field to remove the period and file extensions from your cell text. If selecting the page title you can remove repetitive SEO text strings like "| Company Name"that proceeds or follows the title text you want to appear in your page cell.

        site_crawl_1.png
    2. Follow Mode:
      1. Click Domain And All Subdomains if you want the site to crawl the domain and all subdomains.

        For example, if someone enters http://example.com/dir/, the Domain And All Subdomains option will get a link from the example.com domain as well as all subdomains. This will include links from domains; e.g., food.example.com or bar.example.com.
      2. Click Domain Only if you want the site to crawl only the domain. This will only follow links from example.com and also www.example.com but all the other subdomains are ignored.
      3. Click Domain And Directory Path Only if you want the site to crawl only the domain and directory path. This will only follow links that begin with http://example.com/dir/ and http://www.example.com/dir/.
      4. Don't Follow Query String Variables - will exclude links containing "dirty" url parameters like for example: http://example.com/this-is-link?thisis=dirty&parameter=1 so any link having links like "?foo=bar&id=example" will be ignored.

      sajtkrauler.png

    3. Check Add Link to add a link to imported pages.
    4. Check Add Meta Description Note to add a note with the content from the <meta description> tag.
    5. Check Limit Number of Pages to set the maximum number of pages you want the import tool to crawl. For example, entering 10 will crawl a maximum of 10 pages from the given website.
      1. Enter the number of pages in the blank cell.
    6. Check Omit Directory to add directories you want the site crawler to avoid.
      1. Enter a directory name in the blank cell. Include /* after the directory name; otherwise, it will add pages inside the directory (except this one).

        For example, if someone enters /articles/* and /blog/test/ it will ignore all pages http://example.com/articles/* and one http://example.com/blog/test/ page.
      2. Click the plus sign to add another directory name. You can add as many directories as you want.

    7. Check Import meta data to Content Planner allows to import metadata and url slug to content planner for each fetched page.
    8. Select Use HTTP Basic Authentication to enter your username and password for HTTP basic-authentication protected pages.
    9. Click Import.

    Depending on the size of an existing site, you may wait several minutes before your sitemap is built. The number in brackets represents the number of pages being crawled. For example, Gathering Links (3) means the crawler has scanned 3 pages so far.

    Note: Keep in mind that the crawler can only crawl publicly accessible websites. It cannot crawl intranet sites or websites with custom authentication: pages where a username and password are required to log in.

    During the crawling process, users can close the browser window, edit another sitemap, or crawl another website on another sitemap.

    You have now successfully built a sitemap using the site crawler. Your new sitemap is now ready for editing and customization!

The site crawler feature is only available to users with the Pro, Team and Agency account subscription.

Note: For security reasons site crawler is limited to 5000 pages (you can fetch up to 5000 pages during one crawling process).

Have more questions? Submit a request

Comments

Join over 180,000 registered users

plans start at just $8.99 a month

Get Started Today

No credit card required