Marketing Festival : Presentation resources

Presentation:

How to get inside the search engine crawler head - Marketing Festival from Filip Podstavec

ELK access for Marketingfestival.com website:

URL https://elk.marketingfestival.cz/
Usernameelk
Passwordmktfest

Examples from our diagnosis:

#1 Bot requests

#2 Googlebot agents

#3 Most visited URLs

#4 Pie chart of Googlebot status codes

#5 Googlebot errors

#6 User errors

#7 IP requests

#8 Googlebot IPs

Useful tools:

Check if IP really belongs to Googlebot

Static log analysis tools

BigQuery

Screaming Frog Log Analyzer

OpenRefine


Real-time log analysis tools

ELK

Graylog

Botify

Logz.io


Useful articles:

How to fight for your logs

What Crawl Budget Means for Googlebot

Crawl Optimization from AJ KOHN (nice article)

Closing a spider trap - Fix crawl inefficiencies

Taming the Spiders: Optimize Your Crawl Budget to Boost Indexation and Rankings

Useful presentations:

Negotiating crawl budget with Googlebots (Dawn Anderson)

How to optimize Google's crawl budget? - BrightonSEO 2017 (OnCrawl)

Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016 (Dawn Anderson)

How to Optimize Your Website for Crawl Efficiency (SEMrush)

Where to download ELK:

https://www.elastic.co/downloads

How to install ELK:

https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04

https://www.howtoforge.com/tutorial/how-to-install-elastic-stack-on-ubuntu-16-04/

https://www.rosehosting.com/blog/install-and-configure-the-elk-stack-on-ubuntu-16-04/

Email to your developers:

Hello my dev friends!
How are you? I have an important request for you: I was at the Marketing Festival and there was a presentation about the importance of access_logs. What I want to ask you is if we measure access logs and if we do, then where are our logs stored?
Logs are a very important resource for my future work. So can you please send me the export from our access_log file from the last 7 days? Or the access to our log analysis tool would be even better, if we have any.

If we don't have any log analysis tool, do you think we could implement ELK stack to our server, please? Here are some tutorials on how to do that:
- https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04
- https://www.howtoforge.com/tutorial/how-to-install-elastic-stack-on-ubuntu-16-04/
- https://www.rosehosting.com/blog/install-and-configure-the-elk-stack-on-ubuntu-16-04/

You can download ELK here: https://www.elastic.co/downloads
Thank you,

[NAME]

Answers to questions from Sli.do

Will logs survive GDPR?

Author: Jonas Nordstrom
Number of votes: 18

Answer:

Logs totally survive the GDPR. The question is how much GDPR affects them.

The main problem are IPs inside them but in the worst case you can store just hashes instead of IPs.

If something like that really happens then it can affect some particular parts of log analysis (for example you can't cut out suspicious IPs based on your logs because you'll not know which IP is that). But i still think there will be some changes inside GDPR and the logs will be not affected by this law.

What's your secret to growing 1 cm a year as an adult man? #diagnosisdoctor

Author: Someone from Belgium
Number of votes: 11

Answer:

Eh :-D I think that's from that slide with my patient medical history right?

The secret is to eat only vegetables and optimize your growth every year :-)

At what point of SEO strategy/execution you recommend to look into logs? Is it priority?

Author: Petr
Number of votes: 8

Answer:

I think the best way how to work with logs is to work with them continually. For example if you work with real-time log analysis tools, then you can set there monitoring to let you know whenever something important happened on your website. Or create dashboards and check them periodically.

But they should be totally a part of an SEO audit.

Can we detect click fraud from log files?

Author: D
Number of votes: 7

Answer:

I'm very sorry but I don't know. I'm not expert on PPC and I have no experience with click fraud detection.

You said the logs are good resource not just for SEO consultants but for all marketers. Can you name some processes how others can use them?

Author: Anonymous
Number of votes: 7

Answer

Yes, they're. I'll not talk here about every online marketing specialization so I choose for example PPC specialists:

What they could check inside the logs are the URL with gclid parameter or specific UTM parameters and check their response. With that they could check status codes of all landing pages inside their ads. Without any Adwords script or anything - right inside the log with just few clicks in ELK (for example).

Or they could check the behavior of Googlebot Ads. Adwords has it's own crawler that identificate itself like AdsBot-Google (check out more Googlebot user-agents here)

<

I find CZ market for SEO looks a very closed community where you have to know people who know people.. how do you describe SEO community in CZ?

Author: Anonymous
Number of votes: 5

Answer

I have to agree. I think the Czech republic is full of very skilled marketers but we're a little bit afraid from the foreign countries and especially from foreign languages.

One reason could be the Seznam.cz - specific Czech search engine. You can discuss about this search engine only with other Czech SEO specialists or people from Slovakia. Another reason could be the slower Google feature rollout in the Czech market. People from US see and work with a lots of features on Google right now, but we don't have most of them available in the Czech market.

But in my personal opinion the main reason is the language block.

Do you compare data from Web master tools with logs data? For example in a case of 404 status code?

Author: Miroslav Ficza
Number of votes: 5

Answer

Usually not, but what you can do with logs is to check the real request from Googlebot to 404s and fix them. After that you can check all the Crawl errors inside the Google Search Console as fixed and wait for new crawl errors.

If you still have some problems inside the internal linking, then Googlebot will report you them again. The reason why sometimes people see a lot of crawl errors (to "non-existing" pages) inside the GSC is that lot of these URLs are actually just saved inside the URL scheduler on Google side.

Are you sure bad robots pay attention to your robots.txt disallowance?

Author: Anonymous
Number of votes: 5

Answer

I'm sure they're not :-) I was talking there about user-agents from tools you don't use and you know you'll not use them in the future (like Ahrefs, Majestic, etc.). In that case the robots.txt disallowance is enough and you can save a lot of server traffic.

But in case you identify any "bad bot" then just it's IP out from your server.