Web content analysis

SEO  |  22.10.2016

You can check in a few simple steps what are competition's web's target queries or what headlines do they use to attract user's attention. What is the readability of their content or what articles/sections are most shared on social networks. And what do you need to be able to do that? Their sitemap or a list of URL articles.
Step 1

Insert URLs that a user needs to get data about

To insert data, the user uses section URL Miners, where he imports URL dataset with one of methods of import. In this case, the import of competitor's sitemap or of their own sitemap is probably most appropriate.

Example of inserting URL through clipboard in URL Miners section

It is useful to give the report a name, by clicking on Dataset name that makes it easy to be identified. In case a user wants to save input dataset for future use for different purposes, they can check the box Save dataset.

Example of saving dataset

By pushing this button, a user gets to the miners selection.

Step 2

Miner selection and collecting data

The best miners for efficiency analysis and content targeting analysis are:

Content AnalysisIt gives us data on website content, its structure, headlines, number of words,...
Social SignalsData on amount of URL shares on social networks

Thanks to them a user becomes aware of the content and topics covered on URL as well as the information about its range on social networks. It is up to every user, of course, to add other miners that would allow them to collect more data on URL.

Example of content analysis miner selection

User then clicks on Get data, which will move them into data processing section. Based on data volume they are processed in the background and once completed, results emailed to a user.

Output example

https://www.marketingminer.com/cs/report/cd3b2a6b3e0cec388ebbd483bda050514e454a84b841f26b7c6dacd2308b2add/visualize/miner.url.content_analysis Example of final report of content analysis

Column description

List: Data
Keyword/URLURL that data was collected about
Facebook sharesNumber of shares on social network Facebook
Google +1 Number+1 URL on social network Google Plus
LinkedIn sharesNumber of shares on social network LinkedIn
Pinterest pin countNumber of pins of this URL on social network Pinterest
Stumbleupon viewsNumber of URL views on social network Stumbleupon
ContentAnalysisContent analysis status (if there is redirect or error on given URL, it sends back error)
CanonicalCanonical URL identification, if it's indicated in source code
Meta descriptionContent of meta tag description
H1Main title(s) content
TitleWebsite title (<title> in source code)
Title ScoreTitle score of website based on its length, special character presence and an adjective in it. Titles with these elements, according to international studies, have higher CTR in SERP.
Adjective in titleDetection whether there is an adjective present in the title
Special character in titleDetection whether there is a special character present in the title
WordsNumber of words on given URL
Words without stop wordsNumber of words after cutting off stop words (example of stop words: and, but,...)
ParagraphsNumber of paragraphs in the content of given URL
LinksNumber of links (internal, as well as external)
Number of external linksNumber of internal links
Number of internal linksNumber of external links
X-Robots-Tag X-Robots-Tag content, if any is existent. More on X-roborts-tag at: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag
Meta robotsMeta tag robots content
Rel="next"Detection whether there is a following page marking anywhere in the content with rel="next"
Rel="prev"Detection whether there is a previous page marking anywhere in the content with rel="prev"
Article TypeAutomatic categorization of article based on its title (for example Instructions, Quiz,..)
CommentsNumber of comments for article
Flesch Kincaid Reading Easehttp://en.wikipedia.org/wiki/Flesch-Kincaid#Flesch_Reading_Ease
Flesch Kincaid Grade Levelhttp://en.wikipedia.org/wiki/Flesch-Kincaid
Gunning Fog Scorehttp://en.wikipedia.org/wiki/Gunning-Fog_Index
Coleman Liau Indexhttp://en.wikipedia.org/wiki/Coleman-Liau_Index
SMOG Indexhttp://en.wikipedia.org/wiki/SMOG_Index
Automated Readability Indexhttp://en.wikipedia.org/wiki/Automated_Readability_Index
Dale-Chall Readability Scorehttps://en.wikipedia.org/wiki/Dale–Chall_readability_formula
Spache Readability Scorehttps://en.wikipedia.org/wiki/Spache_readability_formula
Step 3

Output analysis

An output can be then analyzed by the user with a use of tools that can work with XSLX outputs. We recommend these step-by- step instructions of analysis below:

Excel instructionsLink to download tool
OpenRefine instructionsLink to download tool
Tableau Public instructionsLink to download tool

Examples of content analysis use in practice

Consider above mentioned output:

It is a part of articles from blog web H1.cz. What is an output of blog posts of this company telling us? What articles do they publish? Which ones are successful and what is behind their success? First a few facts and output summary:

Example of final content analysis report

Title optimization

An average title length is around 47.49 letters/b>, which is right, because a search result would not show the whole title, it would only show a part of it, if the length is exceeded. A longer title can also be less readable for a user. Title score of 69.28 is really above average compared to the most of Czech or international websites. This means that a blog maker makes sure their headline is attractive before publishing. According to statistics, there are 23 articles with the title score of 0-50 in the content, which means that the title is not readable (too long even without stop words) or there is another problem. These websites should be checked and the titles should be reconsidered.

Content information

Most of the articles in a given blog has around 900 words. Specifically, over 60% of content has more than 600 words. This means that the blog publishes long and extensive articles. Aside from that, a user can find a summary of detected types of articles in the Article Type column. In this case, it is clear that this blog publishes predominantly types of articles How to, meaning instructions article types. Then also Which and Why.

User activity

In this case, by user activity we mean two metrics: number of social signals and number of comments. They both indicate a character and a priority of an article. These metrics are available in the individual output columns from Social Signals miner and in the Comments column (number of comments of an article). Simple output arrangement can help user get to the most commented and most shared content that can help inspire them.

Další články