The web is 25 years old.
Much has been said about its liberating power, all of which is undeniable progress. But like all human initiatives there is a flip side, and it is organised and ultimately pushed to us.
This is shifting from a process led by humans (as championed by the early Yahoo!) to one led by machines.
Google uses 57 signals to tailor the information which is presented to you even when you are not logged in. Fifty seven.
Eric Schmidt claims,
“It will be very hard for people to watch or consume something that has not in some sense been tailored to them”.
There are, essentially, three broad ways to have content pushed or recommended to you:
[1] Collaborative filtering: A popular approach that works on the basis of gathering and analysing large datasets and, based on profiles of users and similar users, feeds this to algorithms which recommend or push an item deemed of possible interest.
The data points fall broadly into two categories: those given explicitly by a user and those data points that the machines collects on their own. One of the strengths (or weaknesses) is that the actual characteristics of the items recommended to the user is not analysed. It’s not the content of the book or article you are reading that influences the recommendation but how it has been interacted with.
[2] Content based filtering: It does what it says on the tin! Any content can be characterised by a tag or metadata. The filtering works by matching that metadata to those that are used to profile a user.
The challenge of this approach is the need to actually wrap metadata around this said content and do this at scale. A challenging task as content can mean different thing to different people. A reader might tag this article privacy and freedom. Another one might think tags such as technology, Artificial Intelligence and Recommendation tool are appropriate. They would both be right.
Text (as opposed to physical products) offers two ways to wrap metadata: a semantic approach (looking at the words and keywords using what is known as ontologies) or a statistical analysis. The latter is about telling an “intelligent” machine what concept (rather than keyword) should be attached to a sample of content and using artificial intelligence to propagate and extend the logic to a larger content set (disclaimer, my start-up does exactly these sort of thing!).
[3] Hybrid system: Do I need to say more? Combines both algorithmic approaches to provide a recommendation with the challenge to choose the “relevant” weight behind each of the components to make the best recommendation.
Whilst this is a simple and narrow presentation of the technology used to present and filter what you read online, the technology is getting smarter.
That 25-year-old web is losing its youthful idealism in favour of something worth our vigilance. It’s worth remembering how information can be pushed to us and even edited out.
Photo (cc) Steve Rhode on Flickr. Some rights reserved.