Skill mapping consists in using technology to associate one or more skills with a content. This is a necessary preliminary step before any HR objects can be linked together. But how effective is this AI-based approach? Does some content provide more accurate results than others? Here are some explanations.
How do we measure the effectiveness of skill mapping?
To quantify the performance of skill mapping – as with any system that relies on AI – one needs “ground truth” data. This data corresponds to a set of known input-output pairs (often hand-annotated by humans) that serve as references for the algorithms.
For example, in the case of learning, this training data can be manually generated through course titles and descriptions. These elements are usually sufficient to allow a person to indicate which skills or knowledge are covered by the training content and, conversely, which are not.
The resulting list, indicating whether the skills and knowledge are related to the training content or not, is then used as a reference to train the mapping algorithm to recognize similarities and non-similarities between skills and learning content.
Afterwards, it is possible to measure the effectiveness of the mapping technology by comparing the results obtained by the algorithm with the list which has been annotated manually.
What elements affect the level of accuracy of skill mapping?
The number of skills requested, the type of content, the amount of text provided and the language impact the accuracy level of skill mapping. Here’s why.
→ The number of skills requested
The level of accuracy of skill mapping depends on the number of skills requested: the higher the number of skills requested, the lower the accuracy.
Thus, when a single skill is requested, the level of accuracy of the results is very high. Then, as the number of skills requested increases, the results lose precision.
This finding is not surprising, it is even quite logical because the higher the number of skills requested, the less specific they are to the subject or field concerned.
→ The type of content
The level of accuracy of skill mapping varies according to the types of content analyzed: the more content and skills are related to each type in the training data, and the more the content refers to specific technical skills, the more accurate the results obtained.
Thus, if we take the case of learning content again, the level of accuracy of the mapping fluctuates according to the field concerned. For example, the mapping of learning content related to computer software offers very precise results: the technical skills to which it refers is generally expressed in a very explicit way, there is no other way to do so, and this content is highly represented.
Conversely, the mapping of learning content related to the health and well-being fields yields less precise results: it generally refers to more generic skills and soft skills that can be expressed in a wide variety of ways, and there is less learning content related to these domains.
→ The amount of text provided
The level of accuracy of the skill mapping is partly conditioned by the amount of text provided: a minimum number of words is required to obtain relevant results.
In the context of learning again, this level of accuracy increases gradually from 1 to 20 words. Then, after 30 words, it stabilizes and there is no additional gain – even when the content provided contain hundreds of words. The titles and descriptions of training courses must therefore contain at least 20 to 30 words in order to obtain an effective mapping.
However, regardless of the number of words provided, it is important to keep in mind that the less “common” the words used, the less likely they are to be in the training data, and thus the higher the probability that a skill will not be recognized correctly.
→ The language
The level of accuracy of skill mapping is impacted by the language: the more the language is represented in the “ground truth” data, the more accurate the model can be in that language.
As an example, the BERT linguistic model (developed by Google) has been trained on large bodies of text such as Wikipedia and similar. These input sources are mostly written in English, so the model provides the most accurate results in that language.
Learn more about it, discover our latest white paper on mapping and matching performances: “Skill-driven Matching applied to Learning: 10 questions you may have about matching learners and L&D content”
Illustration credits: https://www.istockphoto.com/fr/portfolio/Irina_Strelnikova