How To Untangle the Website Architecture of a Site with 500,000+ Pages

The website architecture of a site with 500,000+ pages is complicated and difficult to untangle. In order for the user experience to be good, these websites need designers that have an immense amount of knowledge about web development. To best understand how this process works in practice, we’ll take a look at some examples from real-life case studies.

In many aspects, a company’s website functions similarly to a digital store or office. It is the location where consumers or clients engage with the company. It is here that customers make their initial contact with the firm and where they finally purchase its goods or services. It’s also what they utilize to form opinions about a company’s brand. A business website, like an office or store, should be appealing and friendly to visitors.

A website must be simple to browse and enable visitors to easily locate what they are looking for. In reality, having a simple framework for a website is much more vital. This is because bad site design may have a detrimental influence on SEO. It may be quite problematic if Google bots and search engine crawlers are unable to read or traverse a website.

Even if a site owner or webmaster began with the greatest of intentions, it’s all too simple for site design to become jumbled and confusing. The day-to-day activities of a company sometimes come in the way of good website structure.

Our Web Design Experience With a Client

Pages and components may be added to a site without regard for the overall design in an ad hoc content strategy. Individuals who have little grasp of a site’s broad structure may find themselves in charge of maintaining it due to staff churn. As a consequence, the site architecture is often unwieldy and inefficient.

That was precisely the predicament we found ourselves in when working with a customer. Their website has expanded to over 500,000 pages over time. The sheer amount of pages on the site, as well as how they were introduced, had caused major SEO concerns. The task of untangling the site architecture of a massive website presented itself to us.  

Even the most complicated and large website may be brought into line. What you’ll need is a well-thought-out approach and the will to see it through. We’ll talk about our experience dealing with a customer who’s name has been changed to protect their identity. We hope that by doing so, we’ll be able to provide you some tips on how to deal with your own structural SEO issues.


The first step for an outside expert coming in to advise or address technical SEO difficulties with a site is to grasp the scope of the project. You can only begin to design a plan for moving ahead if you have a comprehensive understanding of the situation. Even if you’re just trying to improve your own website, looking at it as a whole is an excellent place to start.


It is feasible to carefully evaluate site architecture and discover flaws on smaller sites. This becomes more challenging when a site grows in size. It’s just not feasible if you reach a site with over 500,000 pages. You may commit every waking hour of every day to the process for weeks and yet just scrape the surface. Technical SEO tools may help with this.

Today, there are several technical SEO and Google Analytics solutions available. Some are free to use, while others need payment, while a couple provide both free and paid options. Everything from assessing page performance to testing your structured data for implementation issues may be done with these tools.

SEMrush crawls a website.

The SEMrush Bot is a quick and easy approach to do the kind of deep crawl needed to understand a large technical SEO job. That thorough scan will aid in the detection of both fundamental and complex technical SEO problems that afflict every website.

The following are some of the most fundamental faults it may detect:

  • URL errors are a kind of error that occurs when a URL is typed incorrectly.

  • Page titles are missing.

  • Metadata is missing.

  • Response codes that have been tampered with

  • Canonical omissions

A deep crawl using SEMrush may also assist you with some more complex tasks:

  • Identifying pagination problems

  • Internal Linking Assessment

  • Visualizing and diagnosing other site architectural issues

When we did a thorough scan of the site, we discovered a number of major concerns. We were able to establish a general approach for untangling the site’s architecture after identifying these concerns. Here’s a rundown of some of the problems we found.

The first issue is the distance from the homepage.

One of the most obvious concerns shown by the deep crawl was the distance between certain pages and the homepage. There was material on the site that was discovered to be anywhere from one to fifteen pages away. As a general guideline, material should never be more than three clicks away from your homepage.

The ‘three click rule’ serves a dual purpose: it is both user-friendly and SEO-friendly. Visitors are unlikely to be eager to go through 15 pages to obtain the information they want. If customers can’t locate what they’re seeking for promptly, they’ll leave your site and look elsewhere.

From an SEO viewpoint, limiting click distance makes sense. The distance between a page and a site’s homepage is included into search engine algorithms. The more distant a page is seen, the less essential it is perceived to be. Furthermore, the increased link authority of the homepage is unlikely to help sites that are not on the main page.


Second, just 10% of pages are indexed.

We discovered that just 10% of the sites were indexed using Google Search Console. We also looked at the site’s log files as part of our technical SEO study, which showed a slew of additional concerns.

Unindexed pages are effectively ‘unread’ by Google and other search engines. There’s little you can do from an SEO sense if the search engines don’t locate and’read’ your website. It will not appear in any search results and is thus worthless for SEO.


A staggering 90% of the site’s pages fall into this category, according to our deep crawl. There were over 500,000 pages on the site, equating to about 450,000 with little to no SEO value. It was, as we have said, a serious concern.

Third, the URL structure is ambiguous.

During the site audit, it was also discovered that the material lacked a consistent URL structure. The URLs for pages that should have been on the same level of the site did not reflect this. There were conflicting signals delivered to Google regarding how the material was classified; this is something you should think about when you build up your blog or website.

An example is the best method to describe this problem. Assume you have a website with a wide range of items. Your URLs should make sense as they go from domain to category to subcategory to product. ‘website/category/sub-category/product’ is an example. The URL structure should therefore be uniform across all product pages.

If any of the goods have a distinct URL structure, problems emerge. If a product’s URL is anything like ‘website/product,’ for example. The product is subsequently placed on a separate level of the website than the others. This causes both search engines and people to be perplexed.

Another element that needed to be addressed, in addition to concerns with the URL structure, was the quantity of links discovered on a page; this was, in part, a sitewide issue. The menu, for example, has almost 400 links. In the meanwhile, there were 42 links in the footer menu. This is a far larger number of connections than most people would ever utilize. It was evident that a big number of these links were not being utilized often enough to justify their inclusion in the menu or footer.

During the site scan, we discovered multiple pages with 100 or more links, including the homepage. The fewer internal PageRank each of those connections passes, the more links a page has, including the menu. It’s also a sign of a muddled web structure.

Overall, it was evident that the site’s internal linking scheme had major flaws. This method had an influence on how Google crawled the site, as well as a negative user experience for visitors.

Additional Concerns

A number of additional concerns were discovered during the site examination. Some of these faults would have no effect on the website’s design, but if rectified, they would boost search results. The following is a list of some of the most important aspects that we needed to address:

  • Thousands of tags were used on the site, according to the robot text file. Only a few pieces of information were related with the bulk of tags. Furthermore, there was the possibility of optimizing the crawl budget by limiting bots access to the site, which might enhance page performance.

  • Boost Metadata: To improve organic Click Through Rates, the metadata for ranking pages might be evaluated and updated.

  • Page Load Time: There was room for improvement in the page load time. This is another ranking factor for Google.

  • We discovered a number of remnant domain sites that were receiving a lot of Google Bot activity as a consequence of the server log analysis; this isn’t ideal. It’s possible that the action resulted in those expired sites still showing up in search results.


The extent of the endeavor ahead of us was established by our analysis and study. It clearly forth the problems that needed to be addressed. As a result, we were able to devise a plan for tackling those difficulties one at a time. Here’s a rundown of four important areas where we worked to enhance the website’s structure. Other activities that we did, such as upgrading meta descriptions and content on key pages, which is an important element of a website audit, are not included in the list.

Redirects and other tweaks are the first step.

The first project we tackled was redirecting obsolete pages. We’ll start with the ones we found to have the highest Google Bot activity. Pages were pointed to the most relevant material. Occasionally, this indicated a page that had taken the place of the original. On other cases, it referred to the website’s home page.

This plan provided us with a fast victory to get us going. It made sure that any traffic that came from the expired domains didn’t end up on a broken page. Another easy change we made at the start was to the PR page’s layout, which eliminated Google’s misunderstanding over whether it was the site’s archive page or not. This resulted in an instant improvement in content indexing.

In addition, the site included tens of thousands of tags. We didn’t want tags with little substance to show up in search results since it would give visitors a bad impression of the site. As a result, those pages were deindexed. 

We worked on enhancing the menu structure in addition to redirecting expired sites. As I previously said, the website’s menu included over 400 links, considerably more than site users required. In addition, there was four times the recommended amount of connections per page.

While we recognized that the quantity of links in the menu was a problem that needed to be addressed, we still had to decide which connections to eliminate. Our answer was to look at which links people were clicking on in order to figure out which ones would be most valuable to readers. The data we evaluated was generated using a mix of Google Analytics and heatmap tools.

We began the process of determining the category and sub-category sites that we knew required to be on the menu after we had selected the most beneficial links. In this case, we devised a site layout that placed all material within three clicks of the homepage.

Step 3: Restructuring & Clustering

It was time to move on to the bigger challenge of changing the site architecture after our first immediate successes. We wanted to address the problem of click distance and make the site’s structure more sensible. Both crawlers and real people would be able to traverse it more quickly this manner.

We used the concept of grouping content as one of our first steps. This means grouping relevant sites together and connecting to them from a pillar subject page. That page would be only a few clicks away from the site’s homepage, allowing a vast assortment of related sites to be accessed in just three clicks. Visitors and crawlers will find it much simpler to navigate a site’s content with content clusters like this.

We began organizing the massive amount of pages using clusters. We concentrated on the notion of constructing a content pyramid when it comes to the overall structure. The gold standard for site structure is a pyramid like this. The homepage of the website is located at the summit of the pyramid. Sub-categories are below category pages, and sub-categories are below that. Individual pages then make up the pyramid’s broad base.  

The problem of links was quickly resolved as the content clusters and pyramid took form. It was considerably easier to interlink pages when there was a specified structure. There were no longer any pages with hundreds of unnecessary connections.

The fourth step is to improve the URL structure.

It was a breeze to optimize URL structure when we restructured the site. Because of the content pyramid, the site’s URLs were able to accomplish the logical flow we discussed previously. Pages were no longer strewn across the web at various levels on an ad hoc manner.

The URLs of the site’s several pages were aligned with a single established structure, which eliminated the confusing signals supplied to Google. The search engine could now interpret — and hence index – the site’s information significantly more simply.

Adding Dimensions to Images (Step 5)

For many websites, the file size of a picture is a key factor in page performance. Images are often formatted incorrectly, if at all. By adding dimensions to pictures, you may drastically reduce page load time by guaranteeing that an image is shown accurately the first time.

By adding dimensions to the photos, we were able to decrease the size of the category pages from over 25 MB to a more manageable 2-4 MB each page. This dramatically improved page performance and user experience while also lowering server load.



Our efforts had a noticeable influence on the location. Changes to the site’s structure had a significant and quick impact on indexing. As a consequence, site traffic increased in a similar manner. The proportion of pages indexed by Google increased from 10% to 93 percent in the first three months. Furthermore, the proportion of URLs that were authorized increased.

The fact that 350,000 pages were immediately indexed resulted in an increase in site traffic, which was unsurprising. The number of visits increased by 27% in the first three months and by 1200% after nine months.


The actions we made to enhance the site’s architecture may not be applicable to yours. The overall approach we used, on the other hand, will work. It was an approach that could provide spectacular results even for the most complex and huge websites.

To begin, you must do comprehensive research on the subject. The scope of your task should be defined by thorough investigation and analysis of your site. You’ll have a plan for your improvements after that. You will get results if you complete the task according to the plan.

Watch This Video-

Frequently Asked Questions

How can I improve my website architecture?

A: By using SEO. Search engine optimization is a way of improving your websites traffic through the use of search engines like Google and Bing, which are designed to rank webpages according to their content with duplicate content penalties for pages that appear on both sides of any given topic.
The goal when placing your site in these searches will be to get as many visitors or hits from those searches as possible

Which website architecture is best for SEO?

A: I would recommend choosing a website architecture that is not only optimized for SEO, but also one that makes it easier for you to create content. If your goal is to build an online destination with success in the search engines, then WordPress may be the best option.

How do search engines deal with a poor site structure?

A: Search engines try to make sense of a website by looking for simple, easily understandable things. There are two ways that search engine bots can do this – links and keywords. Link-based searches will look at the anchor text from internal or external URLs on your site, while keyword-based searches will be based off any meta data found in files like .html or .css file headers

Leave a Comment

Your email address will not be published.