In information scienceprofiling refers to the process of construction and application of user profiles generated by computerized data analysis.

This is the use of algorithms or other mathematical techniques that allow the discovery of patterns or correlations in large quantities of data, aggregated in databases. When these patterns or correlations are used to identify or represent people, they can be called profiles. Other than a discussion of profiling technologies or population profiling, the notion of profiling in this sense is not just about the construction of profiles, but also concerns the application of group profiles to individuals, e. g., in the cases of credit scoring, price discrimination, or identification of security risks.

Profiling is not simply a matter of computerized pattern-recognition; it enables refined price-discrimination, targeted servicingfraud detection, and extensive social sorting. Real-time machine profiling constitutes the precondition for emerging socio-technical infrastructures envisioned by advocates of ambient intelligenceautonomic computing and ubiquitous computing.

One of the most challenging problems of the information society involves dealing with increasing data-overload. With the digitizing of all sorts of content as well as the improvement and drop in cost of recording technologies, the amount of available information has become enormous and increases exponentially. It has thus become important for companies, governments, and individuals to discriminate information from noise, detecting useful or interesting data. The development of profiling technologies must be seen against this background. These technologies are thought to efficiently collect and analyze data in order to find or test knowledge in the form of statistical patterns between data. This process, called Knowledge Discovery in Databases (KDD) provides the profiler with sets of correlated data usable as “profiles”.

Types of profiling practices

In order to clarify the nature of profiling technologies, some crucial distinctions must be made between different types of profiling practices, apart from the distinction between the construction and the application of profiles. The main distinctions are those between bottom-up and top-down profiling (or supervised and unsupervised learning), and between individual and group profiles.

Supervised and unsupervised learning

Profiles can be classified according to the way they have been generated. On the one hand, profiles can be generated by testing a hypothesized correlation. This is called top-down profiling or supervised learning. This is similar to the methodology of traditional scientific research in that it starts with a hypothesis and consists of testing its validity. The result of this type of profiling is the verification or refutation of the hypothesis. One could also speak of deductive profiling. On the other hand, profiles can be generated by exploring a data base, using the data mining process to detect patterns in the data base that were not previously hypothesized. In a way, this is a matter of generating hypothesis: finding correlations one did not expect or even think of. Once the patterns have been mined, they will enter the loop – described above – and will be tested with the use of new data. This is called unsupervised learning.

Two things are important about this distinction. First, unsupervised learning algorithms seem to allow the construction of a new type of knowledge, not based on hypothesis developed by a researcher and not based on causal or motivational relations but exclusively based on stochastical correlations. Second, unsupervised learning algorithms thus seem to allow for an inductive type of knowledge construction that does not require theoretical justification or causal explanation.

Some authors claim that if the application of profiles based on computerized stoical pattern recognition ‘works’, i.e. allows for reliable predictions of future behaviors, the theoretical or causal explanation of these patterns does not matter anymore. However, the idea that ‘blind’ algorithms provide reliable information does not imply that the information is neutral. In the process of collecting and aggregating data into a database (the first three steps of the process of profile construction), translations are made from real-life events to machine-readable data. These data are then prepared and cleansed to allow for initial computability. Potential bias will have to be located at these points, as well as in the choice of algorithms that are developed. It is not possible to mine a database for all possible linear and non-linear correlations, meaning that the mathematical techniques developed to search for patterns will be determinate of the patterns that can be found. In the case of machine profiling, potential bias is not informed by common sense prejudice or what psychologists call stereotyping, but by the computer techniques employed in the initial steps of the process. These techniques are mostly invisible for those to whom profiles are applied (because their data match the relevant group profiles).

Individual and group profiles

Profiles must also be classified according to the kind of subject they refer to. This subject can either be an individual or a group of people. When a profile is constructed with the data of a single person, this is called individual profiling. This kind of profiling is used to discover the characteristics of a certain individual, to enable unique identification or the provision of personalized services. However, personalized servicing is most often also based on group profiling, which allows categorization of a person as a certain type of person, since her profile matches with a profile that has been constructed based on massive amounts of data about massive numbers of other people. A group profile can refer to the result of data mining in data sets that refer to an existing community that considers itself as such, like a religious group, a tennis club, a university, a political party etc. In that case it can describe previously unknown patterns of behavior or other characteristics of such a group (community). A group profile can also refer to a category of people that do not form a community but are found to share previously unknown patterns of behavior or other characteristics. In that case the group profile describes specific behaviors or other characteristics of a category of people, like for instance women with blue eyes and red hair, or adults with relatively short arms and legs. These categories may be found to correlate with health risks, earning capacity, mortality rates, credit risks, etc.

If an individual profile is applied to the individual that it was mined from, then that is direct individual profiling. If a group profile is applied to an individual whose data match the profile, then that is indirect individual profiling, because the profile was generated using data of other people. Similarly, if a group profile is applied to the group that it was mined from, then that is direct group profiling). However, in as far as the application of a group profile to a group implies the application of the group profile to individual members of the group, it makes sense to speak of indirect group profiling, especially if the group profile is non-distributive.

Distributive and non-distributive profiling

Group profiles can also be divided in terms of their distributive character. A group profile is distributive when its properties apply equally to all the members of its group: all bachelors are unmarried, or all persons with a specific gene have 80% chance to contract a specific disease. A profile is non-distributive when the profile does not necessarily apply to all the members of the group: the group of persons with a specific postal code have an average earning capacity of XX, or the category of persons with blue eyes has an average chance of 37% to contract a specific disease. Note that in this case the chance of an individual to have an earning capacity or to contract the specific disease will depend on other factors, e.g. sex, age, background of parents, previous health, education. It should be obvious that, apart from tautological profiles like that of bachelors, most group profiles generated by means of computer techniques are non-distributive. This has far-reaching implications for the accuracy of indirect individual profiling based on data matching with non-distributive group profiles. Quite apart from the fact that the application of accurate profiles may be unfair or cause undue stigmatization, most group profiles will not be accurate.

Application domains

Profiling technologies can be applied in a variety of different domains and for a variety of purposes. These profiling practices will all have different effect and raise different issues.

Knowledge about the behavior and preferences of customers is of great interest to the commercial sector. Based on profiling technologies, companies can predict the behavior of different types of customers. Marketing strategies can then be tailored to the people fitting these types. Examples of profiling practices in marketing are customers loyalty cardscustomer relationship management in general, and personalized advertising.

In the financial sector, institutions use profiling technologies for fraud prevention and credit scoring. Banks want to minimize the risks in giving credit to their customers. Based on extensive group profiling customers are assigned a certain scoring value that indicates their creditworthiness. Financial institutions like banks and insurance companies also use group profiling to detect fraud or money-laundering. Databases with transactions are searched with algorithms to find behaviors that deviate from the standard, indicating potentially suspicious transactions.

In the context of employment, profiles can be of use for tracking employees by monitoring their online behavior, for the detection of fraud by them, and for the deployment of human resources by pooling and ranking their skills.

Profiling can also be used to support people at work, and also for learning, by intervening in the design of adaptive hypermedia systems personalizing the interaction. For instance, this can be useful for supporting the management of attention.

In forensic science, the possibility exists of linking different databases of cases and suspects and mining these for common patterns. This could be used for solving existing cases or for the purpose of establishing risk profiles of potential suspects.

Risks and issues

Profiling technologies have raised a host of ethical, legal and other issues including privacyequalitydue processsecurity and liability. Numerous authors have warned against the affordances of a new technological infrastructure that could emerge based on semi-autonomic profiling technologies.

About the author: Scott Bernstein is the CEO of Global Security International LLC headquartered in the Research Triangle of North Carolina. He has extensive experience as a Counterterrorist Consultant, International Apprehension Operative, Human & Sex Trafficking Expert and a Military and Law Enforcement Trainer. He is available as a Consultant and as a Speaker. In addition to his LinkedIn profile, you can also interact with Scott on his LinkedIn group

Scott Bernstein is the founder and director of Global Security International (GSI). They implement unconventional techniques such as criminal profiling, victimology, behavioral Psychology, Neuropsychology, pre-text art and expert skip tracing. To reach GSI (Global Security International), reach them at 984-235-4816 or in writing at

#Information#Profiling #Fraud #Intelligence #Data #Proflies #Data Mining #Knowledge #Algorithms #Behavior #Computer #Marketing #Human Resources #Forensic Science