crowd-sourcing and big data

Linkedin has turned off their Skills pages, it will be sadly missed. Perhaps it’s true that you only appreciate something when it’s gone.

The Skills pages were very useful, it gave recruiters the ability to enter one skill, and LinkedIn would instantly give you a list of other skills that may, in some way, be related. This was not a synonym list, as a synonym would be a different word for the same skill, this was a list of different skills that people with the skill you were interested in were also likely to have. For example, someone who programs in the computer language PHP is likely to also have skills with the database MySQL. These are two completely different things but Linkedin was able to spot the trend and a very useful trend it is.

I develop ATS/CRM software for recruitment agencies and I started to wonder if it would be possible to develop this type of functionality into my own ATS system. Even one of our smaller clients has 1000’s of candidates so there is certainly enough data to be able to analyses and develop trends. After all, all LinkedIn was doing, was working out that people with the MySQL skill are also likely to have the PHP skill, how complicated could it be?

Well, it is complicated, and my short thought experiment has left me in massive admiration for what Linkedin created, gave us, and then took away.
At this point, I should say that while I am a software developer, I am certainly not an expert in search engines or big data. If anyone reading this is an expert, I would be delighted if you can tell me how this can be done.

My thought experiment went as follows:
If I consider all the CVs in a database, could I count the number of times each word appears in all those CVs? By doing this it would give me a measurement of how common each word is. It sounds like a big calculation but it’s not really as I can build the count up over time as new CVs are added.

Then, if the user enters a keyword, like PHP, I can find all the CVs with that word in them. That’s a simple keyword search and every CV database should be able to do that in its sleep.

Finally, all I have to do is work out how common each word is in those CVs and for each one of those words find the ones that are significantly more common than in my database as a whole.  Job done!

Ok, on this last point I welcome a big data expert to step in.
Think about it, there could be thousands of CVs and I’d have to count every word. Then I have to work out a frequency and compare that to the same calculation for the whole database. All this has to be done practically instantly as the user expects an instant result.

Even if I was to manage to do this I’m forgetting that words by themselves have a meaning in context, for example, the word ‘developer’ in the title ‘database developer’ means something quite different in the title ‘property developer’. Secondly, there is no point counting skills like IT (Information Technology) as it looks identical to the word ‘it’, and that will be in every CV on the system.

So how did Linkedin do it? They don’t have thousands of profiles to search they have hundreds of millions! They may have started with a similar thought process to mine and like me would have come to an abrupt halt. But then they thought of the Skills system and a very clever idea that is.

The Skills system has massive benefits for them. Firstly there are relatively few Skills, Linkedin does not allow you to use any word you want they have a finite list. And secondly, most profiles have only a small number of skills. These two simplifications make the calculations dramatically simpler, almost easy, and at the same time remove my other two problems of separating ‘Database Developers’ from ‘Property Developers’ and ‘IT’ from ‘it’.

This, however, is only part of the cleverness I am in awe of. The really clever bit came next. Imagine the design meeting when an excited LinkedIn developer explained this Skills approach as a solution to all their problems. No doubt he or she was delighted to have come to such an elegant solution. Unfortunately, their excitement was about to hit a brick wall as an even bigger problem would have been obvious to everyone else in the room.

The obvious problem at that time was how could LinkedIn create a Skill List and then expect to accurately attribute these skills to 200 million profiles. To do that they would need a workforce of 200 million people, who’d be prepared to work for free, with expert knowledge of their subject and the people whose profiles they were skilling.

As we all know, however, they did just that, or rather, they got us to do it for them. This is crowdsourcing, it is about getting all of us to work for them for free and this is the real cleverness that I’m in awe of.


I spend my career thinking of ways to recruit smarter, then developing the tools so everyone can work that way by making that cleverness seem simple and obvious. Sometimes it’s inspirational to see other companies doing great things and making it look simple too.

PS. Holly Fawcett of Social Talent wrote a much more useful blog about the alternatives to LinkedIn Skills. The article can be seen here.

Share the Post:

Related Posts