Web content analysis
SEO | 22.10.2016
You can check in a few simple steps what are competition's web's target queries or what headlines do they use to attract user's attention. What is the readability of their content or what articles/sections are most shared on social networks. And what do you need to be able to do that? Their sitemap or a list of URL articles.Step 1
Insert URLs that a user needs to get data about
To insert data, the user uses section URL Miners, where he imports URL dataset with one of methods of import. In this case, the import of competitor's sitemap or of their own sitemap is probably most appropriate.
It is useful to give the report a name, by clicking on Dataset name that makes it easy to be identified. In case a user wants to save input dataset for future use for different purposes, they can check the box Save dataset.
By pushing this button, a user gets to the miners selection.
Miner selection and collecting data
The best miners for efficiency analysis and content targeting analysis are:
|Content Analysis||It gives us data on website content, its structure, headlines, number of words,...|
|Social Signals||Data on amount of URL shares on social networks|
Thanks to them a user becomes aware of the content and topics covered on URL as well as the information about its range on social networks. It is up to every user, of course, to add other miners that would allow them to collect more data on URL.
User then clicks on Get data, which will move them into data processing section. Based on data volume they are processed in the background and once completed, results emailed to a user.
|Keyword/URL||URL that data was collected about|
|Facebook shares||Number of shares on social network Facebook|
|Google +1||Number+1 URL on social network Google Plus|
|LinkedIn shares||Number of shares on social network LinkedIn|
|Pinterest pin count||Number of pins of this URL on social network Pinterest|
|Stumbleupon views||Number of URL views on social network Stumbleupon|
|ContentAnalysis||Content analysis status (if there is redirect or error on given URL, it sends back error)|
|Canonical||Canonical URL identification, if it's indicated in source code|
|Meta description||Content of meta tag description|
|H1||Main title(s) content|
|Title||Website title (<title> in source code)|
|Title Score||Title score of website based on its length, special character presence and an adjective in it. Titles with these elements, according to international studies, have higher CTR in SERP.|
|Adjective in title||Detection whether there is an adjective present in the title|
|Special character in title||Detection whether there is a special character present in the title|
|Words||Number of words on given URL|
|Words without stop words||Number of words after cutting off stop words (example of stop words: and, but,...)|
|Paragraphs||Number of paragraphs in the content of given URL|
|Links||Number of links (internal, as well as external)|
|Number of external links||Number of internal links|
|Number of internal links||Number of external links|
|X-Robots-Tag||X-Robots-Tag content, if any is existent. More on X-roborts-tag at: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag|
|Meta robots||Meta tag robots content|
|Rel="next"||Detection whether there is a following page marking anywhere in the content with rel="next"|
|Rel="prev"||Detection whether there is a previous page marking anywhere in the content with rel="prev"|
|Article Type||Automatic categorization of article based on its title (for example Instructions, Quiz,..)|
|Comments||Number of comments for article|
|Flesch Kincaid Reading Ease||http://en.wikipedia.org/wiki/Flesch-Kincaid#Flesch_Reading_Ease|
|Flesch Kincaid Grade Level||http://en.wikipedia.org/wiki/Flesch-Kincaid|
|Gunning Fog Score||http://en.wikipedia.org/wiki/Gunning-Fog_Index|
|Coleman Liau Index||http://en.wikipedia.org/wiki/Coleman-Liau_Index|
|Automated Readability Index||http://en.wikipedia.org/wiki/Automated_Readability_Index|
|Dale-Chall Readability Score||https://en.wikipedia.org/wiki/Dale–Chall_readability_formula|
|Spache Readability Score||https://en.wikipedia.org/wiki/Spache_readability_formula|
An output can be then analyzed by the user with a use of tools that can work with XSLX outputs. We recommend these step-by- step instructions of analysis below:
|Excel instructions||Link to download tool|
|OpenRefine instructions||Link to download tool|
|Tableau Public instructions||Link to download tool|
Examples of content analysis use in practice
It is a part of articles from blog web H1.cz. What is an output of blog posts of this company telling us? What articles do they publish? Which ones are successful and what is behind their success? First a few facts and output summary:
An average title length is around 47.49 letters/b>, which is right, because a search result would not show the whole title, it would only show a part of it, if the length is exceeded. A longer title can also be less readable for a user. Title score of 69.28 is really above average compared to the most of Czech or international websites. This means that a blog maker makes sure their headline is attractive before publishing. According to statistics, there are 23 articles with the title score of 0-50 in the content, which means that the title is not readable (too long even without stop words) or there is another problem. These websites should be checked and the titles should be reconsidered.
Most of the articles in a given blog has around 900 words. Specifically, over 60% of content has more than 600 words. This means that the blog publishes long and extensive articles. Aside from that, a user can find a summary of detected types of articles in the Article Type column. In this case, it is clear that this blog publishes predominantly types of articles How to, meaning instructions article types. Then also Which and Why.
In this case, by user activity we mean two metrics: number of social signals and number of comments. They both indicate a character and a priority of an article. These metrics are available in the individual output columns from Social Signals miner and in the Comments column (number of comments of an article). Simple output arrangement can help user get to the most commented and most shared content that can help inspire them.