Data

From glossaLAB
Collection GlossariumBITri
Author José María Díaz-Nafría
Mario Pérez-Montoro
Mehrad Golkhosravi
Editor Mario Pérez-Montoro
Year 2010
Volume 1
Number 1
ID 29
Object type Concept
Domain Communication Theory
Computation Science
Epistemology
Information Science
Transdisciplinary
es dato
fr donnée
de Daten/Angabe

Intuitively, we can identify the data as physical events (small parts (or pieces) of reality) able to carry certain associated information. They have a material nature and can be considered as the physical support to information. In other words, they are physical facts that do no have any inherent meaning, do not necessarily present any interpretations or opinions, and do not carry indicative characteristics that may reveal their importance or relevance. In this sense, each of the statements printed in this article can be considered as data. The customer's name, the amount of the purchase or bank transaction number that appears on an invoice can be considered as the typical examples of data within the context of the companies.

In an effort to systematize, we can offer the proposal from the following definition:

Data = physical support of information.

Data at organizacional context

It is important to show some characteristics of the data from these viewpoints. Firstly, the question of being some physical events, the data are easy to capture, structure, quantify or transfer. Secondly, a datum, depending on the encryption key in which it is involved (as discussed below), it can be conventional or natural (not conventional).The account number on the back of a credit card is an example of a conventional type of data. Looking at clouds that appear in the sky just before the storm is an example of natural or non-conventional data. Thirdly, the same data can inform an agent or not, as we see below, depending on the stock prior to the staff. Fifthly, within an organization, data are usually of conventional type and they often appear as a collection of materialized alphanumeric characters on a document (either physical or electronic). And finally, in the same context, in the organizations, the indiscriminate accumulation of data does not always necessarily improve decision making.

We can justify this way of defining the concept of data by reviewing how the same concept is understood in other contexts. For example, our characterization reflects the sense that no tension is given to the concept of data in the disciplines of information technology and telecommunications: a set of associated characters of a concept. The character set “35,879,987” about the concept number of national identification (ID) could be an example.

In the same vein, our proposal fits well with the use of the word “data” when defining certain informatic applications. A database management system (SGBDD), without going any further, usually is defined as a resource that enables the management of records from the data or sets of characters appearing in the records (numbers, words, numbers, etc..) That is to say, one can defend the idea that management of the records of these tools is a management of syntactical type (apart from character sets that appear in the records) but not a semantic one (apart from the information content associated with these sets of characters). A SGBDD, facing a search equation, retrieves the records where the data appear to make the equation. In the same way, a Data Mining or Text Mining system, among other things, permits to detect correlations or patterns among data (or sets of characters) that appear in the records which shap the system so that later in an intellectual manner someone can decide whether this pattern is consistent or not with a genuine correlation semantics.

Floridi's Model

a) Diaphoric definition of data (DDD). According to the diafora definition of Floridi (from the Greek διαφορά, difference, discrepancy) “a datum is a putative fact regarding some difference or lack of uniformity within some context.”

According to the author, this definition can be applied at three levels: 1) Diaphora de re: as a lack of uniformity in the outside world, i.e. pure data, before any epistemological interpretation (similar to Euclid's “dedomena”). 2) Diaphora de signo: between at least two physical states. 3) Diaphora de dicto: between two symbols.

Due to the stance concerning the ontological neutrality and the nature of environmental information, (1) can be identical to (2), or make possible the signs in (2), while those signs are necessary conditions for encoding the symbols in (3).

This definition has the advantage of leaving the data free from its support and considers four types of independence or neutrality: taxonomy (with regard to the classification of the relata), typological (with regard to the logical type); ontological (with regard to the nature of the inequality support), genetic (regarding to the semantics of the informee).

In turn, these four types of neutrality have important implications regarding the nature of information and data:

According to the taxonomic neutrality, there is nothing that can characterize a datum per se. Consequently, they are purely relational entities.

According to the typological neutrality, the information may consist of different data types as related: →primary, secondary, metadata, operational or derivative (see below, §2.b).

According to the ontological neutrality, in combination with the rejection of information without data -as stated by General Definition of Information the author proposes-, there can neither be “without data representation”. Therefore, at the same time, this may imply different levels of ontological neutrality: 1) there can be no information without physical implementation (regardless of its nature), 2) every elements in the physical world “derives its function, its meaning, its very existence from the appartus-elicited answer to yes-or-no questions, binary choices, bits” (i.e., what we call reality derives from a theoretical-interrogative analysis), 3) information is nothing but an “exchange with the outer world as we adjust to it, and make our adjustment felt upon it” (Wiener, 1954). 4) “information is a difference that makes a difference”. So the meaning becomes a potential basis accoding to its self-generating ability.

According to the genetic neutrality, semantics can be independent of the informee, thus meaning does not have to be in the mind of the user; which is not the same as the realist thesis, stating that the meaning would even be independent of the producer or informer. This latter assumption is made when “environmental information” is considered.

b) Types of data. Data can be of different types supported by the diaphoric definition:

Primary ~: those that are explicitly related to what is in question (eg. the response of an information system to the query of a user).

Secondary ~: equivalent to the absence of certain primary data (eg. administrative silence in front of a determined petition o request).

Operational ~: those data relating to the operations and the overall system information performance (eg. an indication that the system is not working properly or it is busy).

Derivative ~: those data that can be used as indirect sources in inquiries different to those directly or primarily addressed by the data themselves (eg. “The fact that someone has mentioned the sun twice is a sign that he is in a good mood”).

Meta ~: information about the nature and characteristics of other data, usually primary data (eg. “What he/she is saying is a lie”, “this text is stored in an extended ASCI code,” “in the data received there are not any detected errors ”...).

References

  • BOISOT, Max H. (1998). Knowledge Assets. Oxford: Oxford University Press - Davenport, T.; Prusak, L. (1998). Working Knowledge. Boston: Harvard Business School Press.
  • FLORIDI, L. (2005). “Semantic Conceptions of Information”. Stanford Encyclopedia of Philisophy, Edición electrónica [online] <http://plato.stanford.edu/entries/information-semantic/> [Consultado: sept/2008]-
  • NONAKA, Ikujiro y TAKEUCHI, Hirotaka (1995). The Knowledge Creating Company. Oxford: Oxford University Press.
  • PÉREZ-MONTORO GUTIÉRREZ, Mario (2008). Gestión del Conocimiento en las Organizaciones. Gijón: Trea. ISBN 978-84-9704-376-2.
  • PÉREZ-MONTORO GUTIÉRREZ, Mario (2007). The Phenomenon of Information. Lanham (Maryland): Scarecrow Press.
  • PÉREZ-MONTORO GUTIÉRREZ, Mario (2003). “El documento como dato, conocimiento e información”. [Online]. Tradumática, núm. 2, 2003. <http://www.fti.uab.es/tradumática/revista> [Accessed: 30 dic. 2003].
  • von KROGH, Georg, ICHIJO, Kazui and NONAKA, Ikujiro (2000). Enabling Knowledge Creation. Oxford: Oxford University Press.