Going back

[Infographic] Skills mapping: 4 elements that influence its level of accuracy

Skills mapping is a necessary preliminary step before linking different HR objects together. Nevertheless, it is important to keep in mind that some elements can influence the level of accuracy of this AI-powered approach. Zoom in on four of them.

The number of skills requested from the system


The number of skills requested from the system influences the level of accuracy of the skills mapping.


Indeed, the skills returned by the algorithm are naturally ranked from the most relevant to the least relevant. Therefore, the accuracy level of the first result is very high, then the addition of any new skill decreases the accuracy level by a few percentage points.


However, the results obtained remain very acceptable up to 10 skills. Indeed, such a sample allows to keep an average accuracy rate above 70%. We therefore recommend limiting the queries to 10 skills.


The influence of this factor on the level of accuracy of the results is not surprising, since the higher the number of skills requested, the less specific they are to the subject or domain concerned.


Related: Skill mapping: how well does it work?



The type of content analyzed


The type of content affects the level of accuracy of the skills mapping.


This is because the more content and skills are related to each type of content in the training data, and the more explicitly the content refers to the skills, the easier they are to identify.


In the context of learning, for example, the performance of the mapping fluctuates across domains. Thus, the mapping of training content related to computer software offers very precise results. This can be explained by the fact that the technical skills to which the content refers are generally expressed in a very explicit way and there are few other ways of doing so (e.g.: “mastery of Python”). Moreover, these courses are very numerous, which facilitates the identification of the associated skills by the mapping algorithm.


On the other hand, the mapping of training content concerning the health and well-being domains gives slightly less precise results. Indeed, they generally relate to more generic hard skills, and soft skills that can be expressed in a wide variety of ways. For example, the same skill may be referred to as “empathy” or “caring for others”, and “teamwork” or “being drawn to others” can refer to the same skill. In addition, there is less training content related to these areas, so their identification is less obvious to the algorithm.



The amount of text provided


The amount of text provided influences the accuracy of the skills mapping.


In fact, a minimum amount of information is required to obtain a relevant mapping. For training content, for example, it is recommended that the elements provided are at least 20-30 words.


Indeed, our observations have shown that the level of accuracy of the mapping increases progressively when the title and the description of the training courses are between 1 and 20 words. Then, from 30 words onwards, the level of precision stabilizes. It no longer increases – even when the content provided contains hundreds of words.


However, regardless of the amount of text provided, the less “common” the words used, the lower their chance of being in the training data, and, therefore, the higher the probability that a skill is not identified correctly.


Related: Mapping and matching jobs and skills: the unsuspected opportunities



The language used


Language is among the most important factors that condition the performance of the skills mapping.


The more the language is represented in the training data, the more accurate the model will be in obtaining results in that language.


For example, if the language model used is BERT (developed by Google), the algorithm has been trained on large text corpora such as Wikipedia. In this input data, English is the most represented language, followed by European languages (German, French, Polish, Dutch, etc.) and Asian languages such as Japanese or Chinese. The model is therefore more trained in the languages that are most represented in these “training data”. As a result, the level of accuracy of the mapping is generally higher in these languages.


Finally, whatever the expected results, it is important to keep in mind that the richer the “training data” set, the better the model will perform.


Learn more about it, discover our latest white paper on mapping and matching performances: “Skill-driven Matching applied to Learning: 10 questions you may have about matching learners and L&D content”


Discover Boostrs' approach