Alexa Top 1 Million Analysis - August 2017

It's time for the 5th instalment of my Alexa Top 1 Million scan and this time around there's another new metric in the data.


Previous Crawls

I've done 4 previous crawls before now and they were Aug 2015, Feb 2016, Aug 2016 and Feb 2017. I'm also publishing my daily crawl data which is available here for further analysis by the community. Let's dig into the latest data!


Aug 2017

To start off with the good news, things are continuing to get better!


aug-2017-results


One of the biggest changes since the last scan has to be the enormous jump in the number of sites deploying HPKP. This is rather interesting for many reasons, not least because just last week I announced that I'm giving up on HPKP...



The increase in HPKP is almost entirely caused by Tumblr deploying HPKP across their entire catalogue of sites. Whilst the number of their sites in the top 1 million has changed since I first noticed this, there's still a huge jump of over 3,000 sites. Another big win in the scan this time around is the continued growth of the deployment of HTTPS. We're really seeing a continuation of the awesome progress being made here and this was confirmed by Adrienne Porter Felt and April King recently in their talk 'Measuring HTTPS Adoption on the Web' at USENIX (slides).


https-aug-2018-1


We can clearly see there's huge progress each time I conduct these scans on just how fast HTTPS is being deployed across the web but another thing that's really important is that not only is adoption continuing, it's accelerating. I've noticed this increase in the rate of adoption in previous scans and I'm really excited to see it again.


percent-redirect-https


Security Headers

One of the original purposes of my scans was to determine the adoption of various HTTP security headers and I'm still tracking good progress in that area too. We've seen increases in usage across the board and some of them are quite significant. I'd like to think that securityheaders.io is at least helping to drive adoption and education about these headers.


security-headers-aug-2017


What's really odd is that the trend for XXP and XCTO are still there! The presence of all other headers decreases as you go down the ranking except these two and still to this day there isn't a solid explanation for this. As I mentioned above the raw data from my daily scans is available so please do dig into the data and see if you can identify why this trend exists.


Let's Encrypt

I've now been tracking the adoption of Let's Encrypt over 18 months and they too have seen some great progress in their adoption in the Alexa Top 1 Million.


lets-encrypt-aug-2017


The low usage in the very top ranked sites is still present but across the rest of the ranking they've seen significant growth. My guess is that sites right near the top probably have established commercial agreements with a CA but we may see them shifting over time, albeit more slowly.


EV Certificates

After a recent, debate, about the use of EV certificates on Twitter between various parties I decided to add tracking to my crawler for the type of certificates used by sites in the top 1 million. It's interesting that the use of EV certs follows the same trend line as most of the other metrics that I track.


ev-certificate-aug-2017


As you can see the usage of EV certs is much higher in the higher ranked sites and tails off much in the same way that most other metrics do. I'm sure there will be various arguments for why this is the case but my guess is that sites near the top have a higher budget so the cost of EV is less significant to them and worth a shot for any potential benefits.


General Stats

You should check over the raw data that I make available if you want to dig into specifics but this is a nice overview of a few of the stats that my crawlers now collect.


Total Rows: 890204 

Security Headers Grades:
A+	106
A	1101
B	4562
C	36279
D	48231
E	78285
F	721571
R	69 

Sites using strict-transport-security: 65244 
Sites using content-security-policy: 17437 
Sites using content-security-policy-report-only: 1297 
Sites using x-webkit-csp: 439 
Sites using x-content-security-policy: 1154 
Sites using public-key-pins: 3508 
Sites using public-key-pins-report-only: 99 
Sites using x-content-type-options: 104099 
Sites using x-frame-options: 110391 
Sites using x-xss-protection: 82551 
Sites using x-download-options: 9696 
Sites using x-permitted-cross-domain-policies: 9390 
Sites using access-control-allow-origin: 29601 
Sites using referrer-policy: 1615 

Sites redirecting to HTTPS: 273837 
Sites using Let's Encrypt certificate: 63843 

Top 10 Server headers:
 Apache	189985
 nginx	145853
 cloudflare-nginx	93246
 Microsoft-IIS/8.5	31985
 Microsoft-IIS/7.5	29442
 LiteSpeed	19560
 nginx/1.12.1	16369
 GSE	15565
 Apache/2.4.7 (Ubuntu)	11094
 Apache/2.2.15 (CentOS)	10985 

Top 10 TLDs:
.com	439781
.ru	45256
.net	44956
.org	41717
.de	22581
.jp	18791
.br	14158
.uk	14123
.ir	12322
.in	11788 

Top 10 Certificate Issuers:
Let's Encrypt Authority X3	63842
COMODO RSA Domain Validation Secure Server CA	37827
COMODO ECC Domain Validation Secure Server CA 2	30170
Go Daddy Secure Certificate Authority - G2	22479
RapidSSL SHA256 CA	12438
Amazon	7087
DigiCert SHA2 High Assurance Server CA	6191
GeoTrust SSL CA - G3	5812
AlphaSSL CA - SHA256 - G2	5550
Symantec Class 3 Secure Server CA - G4	4849 

Top 10 Protocols:
TLSv1.2	253949
TLSv1	8266
TLSv1.1	177
NULL	0 

Top 10 Cipher Suites:
ECDHE-RSA-AES256-GCM-SHA384	113309
ECDHE-RSA-AES128-GCM-SHA256	79256
ECDHE-ECDSA-AES128-GCM-SHA256	31843
ECDHE-RSA-AES256-SHA384	13991
DHE-RSA-AES256-GCM-SHA384	4344
ECDHE-RSA-AES256-SHA	3048
DHE-RSA-AES256-SHA	2919
AES128-SHA	2072
AES256-SHA	1977
AES256-SHA256	1941 

Top Key Sizes:
RSA	2048 bit	212830
ECDSA	256 bit	32070
RSA	4096 bit	16942
RSA	1024 bit	293
RSA	3072 bit	142
ECDSA	384 bit	81
RSA	8192 bit	6
RSA	4056 bit	3
RSA	3248 bit	3
RSA	2058 bit	2 


Other Observations

There are a few other nice things that I've noticed whilst looking over the data here that I think are worth pointing out.

As I mentioned above Let's Encrypt have seen tremendous growth in the top 1 million sites, but they're actually really close to becoming the biggest issuing CA! In the Feb 2017 scan Comodo had 46,466 certificates issued and Let's Encrypt had 31,030. Now in the Aug 2017 scan Comodo has 67,977 and Let's Encrypt has 63,842. Given the rate at which Let's Encrypt are closing that gap they will very soon become the largest issuing CA in the Alexa Top 1 Million!

Another awesome development is the increase in the use of ECDSA keys in certificates instead of RSA. The Feb 2017 scans saw RSA 2048 bit keys number 146,817 whilst ECDSA 256 bit keys were 20,046. Looking at the data from Aug 2017 we can see that RSA 2048 bit keys are 212,830 with ECDSA 256 bit keys at 32,070. To put that another way, in Feb 2017 13.7% of sites supported ECDSA but in Aug 2017 that had increased to 15.1% of sites.

The protocol support also surprised me a little with some of the changes there. As expected we've seen a huge jump in the number of sites using TLSv1.2 from 171,723 to 253,949. Again as expected we've seen a decrease in the use of TLSv1.1 from 208 sites to 177 sites. What did surprise me was that we've seen an increase in the number of sites using TLSv1.0 from 7,945 to 8,266. Remember, these are sites that can't negotiate a higher protocol version with me and there's really no reason that they we shouldn't be seeing 100% TLSv1.2 support in the top 1 million.

The last few things to quickly note are that Nginx is closing the gap on Apache as the most popular server choice, Cloudflare have seen a significant increase in their presence and this is the first report where no sites in the top 1 million negotiated SSLv3 with my crawler!


Raw Data

As always the raw data from my scans is available here.

By making the raw data available I'm hoping that others will be able to conduct further analysis or use it to further their own research. I dump the data from the crawlers every day so there's a lot to go at!

You can also view the Google Sheet with all of the data and graphs I've used throughout this article and all of my previous articles here.

Update

I will be sending a few tweets with other bits of information that I've found and embedding them below.


Author image
About Scott Helme
United Kingdom Website
Security researcher, entrepreneur and international speaker who specialises in web technologies.