Block Lists

Privacy Browser uses the EasyList block lists. EasyList is formatted using the Adblock syntax.

Privacy Browser features only a partial implementation of the Adblock feature set. Some of these limitations are required due to a lack of functionality exposed by WebView. In the future, with the release of Privacy WebView, some of these controls may be implemented. Other features are not implemented in the interest of performance on handheld devices.

Lines that begin with [ are headers and are ignored by Privacy Browser.

Lines that begin with ! are comments and are ignored except to extract the version number of the file.

Lines that begin with @@ are white lists and are processed according to the rules listed below.

Lines that begin with | match against the beginning of the URL.

Lines that end with | (or have entries that end with |) match against the end of the URL.

Lines that contain \ are regular expressions. Checking a regular expression against a URL is relatively expensive in terms of CPU consumption. Luckily, EasyList only contains a small number of regular expressions.

Lines that contain * could all be processed as regular expressions, but that would be a significant performance issue. Instead, they are processed in the following way:

  1. If the entry also contains \ it is processed as a regular expression.
  2. If the entry begins or ends with * the wildcard character is stripped out. These are redundant as the Adblock syntax defines example.com, *example.com, example.com*, and *example.com* as being the same. In all cases, any URL that contains example.com will be blocked.
  3. If the entry contains text separated by * it is broken into segments and each one is checked against the URL. For example, if the entry is adserver.com/*.jpg the URL will be checked to see if it contains both adserver.com/ and .jpg. This could match against a few URLs not intended by the original block entry. For example, it would match againsthttp://adserver.com/123.jpg but also other URLs where the segments come in a different order like http://otherserver.com/123.jpg?somevaluethatincludesadserver.com/. But the number of false positives should be small and the benefit in processing speed is significant.

Lines that contain $ include filter options. Differentiating many of these would require access to more information than Android’s WebView exposes, and as such are ignored. The only two filter options that are processed are domain and third-party.

Lines that contain filter options preceded by ~ are inverse filter options. All of these are ignored except ~domain, which white lists the domain but applies the filter to other domains. Lines that contain ~third-party are ignored, because they typically are used to block the loading of some type of filter resource (script, xmlhttprequest, etc) from the main domain. Because Privacy Browser is not able to differentiate between these resources, if the block is applied it will block all resources from the domain, rendering it completely unusable. This is something that might be addressed in the future.

Lines that begin with @@ and contain ~domain filters are ignored because it is uncertain exactly what the two mean when combined.

Some lines contain ^, which is a separator wildcard that matches against :, /, ?, =, and &. These wildcards are ignored, which could lead to a small number of false positives as example.com^ will match against intended URLs like http://example.com/ad.jpg and http://example.com:8443/ad.jpg, but also unintended URLs like http://example.computer.com/coolresource. Once again, the performance gain is worth the very small number of false positives.

Lines that contain ##, ###, ##., #?#, and #@# hide class, id, and name HTML and CSS elements in the code. Android’s WebView does not expose these types of controls and the lines are ignored by Privacy Browser.

The raw entries from the block lists are processed into the following 22 ArrayLists used by Privacy Browser.  Each block list has its own set of ArrayLists, which are checked in the following order.

  1. Main White List
  2. Final White List
  3. Domain White List
  4. Domain Initial White List
  5. Domain Final White List
  6. Third-Party White List
  7. Third-Party Domain White List
  8. Third-Party Domain Initial White List
  9. Main Black List
  10. Initial Black List
  11. Final Black List
  12. Domain Black List
  13. Domain Initial Black List
  14. Domain Final Black List
  15. Third-Party Black List
  16. Third-Party Initial Black List
  17. Third-Party Domain Black List
  18. Third-Party Domain Initial Black List
  19. Regular Expression Black List
  20. Domain Regular Expression Black List
  21. Third-Party Regular Expression Black List
  22. Third-Party Domain Regular Expression Black List

Before a web page loads a resource, it is checked against the block lists that are enabled in the following order:

  1. EasyList
  2. EasyPrivacy
  3. Fanboy’s Annoyance List
  4. Fanboy’s Social Blocking List

If a resource matches against a white list entry, the resource is allowed by that block list and checking moves on to the following block list.  If a resource matches against a black list entry, the loading of the resource is blocked by Privacy Browser and no more checking is performed.

White list entries on one list override black list entries on the same list but not on subsequent lists. For example, if a resource is allowed by a white list entry on EasyList, checking will move on to EasyPrivacy. If the same resource is blocked by a black list entry on EasyPrivacy, the loading of the resource will be blocked by Privacy Browser.

Android stores all assets (like the block list data) in compressed files in the APK.  On a Nexus 6P, decompressing and parsing the block lists when Privacy Browser starts takes about 3 seconds.  This is longer than I would like, but I am not sure at the moment how to shorten it down.

On a Nexus 6P, checking a resource URL against the lists takes about 20-30 milliseconds, which is fast enough to be unnoticeable.