CONTENT ANALYSIS,
quantitative analysis of texts and text arrays for the purpose of subsequent meaningful interpretation of identified numerical patterns.
Also on the topic:
LINGUISTICS (LANGUAGE, LINGUISTICS)
The basic idea of content analysis is simple and intuitive. When perceiving a text and especially large text streams, we feel quite well that different formal and substantive components are represented in them to varying degrees, and this degree is at least partly measurable: its measure is the place that they occupy in the total volume, and /or frequency of their occurrence. The theme Y runs through all of X’s performances.
;
X constantly addressed problem Y in his speech
;
He did not miss a single opportunity so as not to kick Z
;
Well, I blew my trumpet
- all these expressions, the number of which can easily be increased, testify to our awareness of such a phenomenon as the presence in the information flow pouring out on us of some persistently repeated themes, images, references to problems, assessments, statements (
Carthage should be destroyed
or
the Russian economy is suffocating without investment
), arguments, formal constructions, specific names, etc. Moreover, just as in the world of mechanics we feel acceleration rather than speed, so when perceiving a text we are especially well aware of the dynamics of the content - those cases when, for example, someone suddenly stops or begins to be scolded, or when in texts suddenly some new topic appears.
The idea of content analysis is to systematize these intuitive sensations, make them visual and verifiable, and develop a methodology for the targeted collection of those textual evidence on which these sensations are based. At the same time, it is assumed that a researcher armed with such a technique will be able not only to organize his feelings and make his conclusions more justified, but even learn more from the text than its author wanted to say, because, say, the persistent repetition of certain topics in the text or the use of certain Some characteristic formal elements or constructions may not be recognized by the author, but are discovered and interpreted in a certain way by the researcher - hence the half-joking definition of content analysis belonging to sociologist A.G. Zdravomyslov as “a scientifically based method of reading between the lines.”
In fact, the main distinguishing feature of content analysis is not its “systematicity” and “objectivity” declared in many definitions (these features are also inherent in other methods of text analysis), but its quantitative nature. Content analysis is primarily a quantitative method that involves a numerical assessment of some components of the text, which can also be supplemented by various qualitative classifications and the identification of certain structural patterns. Therefore, the most successful definition of content analysis can be considered the one recorded in the relatively recent book by Mannheim and Rich: content analysis is the systematic numerical processing, evaluation and interpretation of the form and content of an information source.
From the point of view of linguists and computer scientists, content analysis is a typical example of applied information analysis of a text, which boils down to extracting from the entire variety of information available in it some components that are specifically of interest to the researcher and presenting them in a form convenient for perception and subsequent analysis. Numerous specific variations of content analysis vary depending on what these components are and what exactly is meant by the text.
The specific application goals of content analysis also vary widely. Back in 1952, the American researcher B. Berelson formulated 17 goals, which have since been reproduced in textbooks on content analysis; among them - a description of trends in changes in the content of communication processes; description of differences in the content of communication processes in different countries; comparison of different media; identification of propaganda techniques used; determining the intentions and other characteristics of communication participants; determination of the psychological state of individuals and/or groups; identification of attitudes, interests and values (and, more broadly, belief systems and “models of the world”) of various population groups and public institutions; identifying the focus of attention of individuals, groups and social institutions, etc.
Historically, content analysis is the earliest systematic approach to studying text. The very first content-analytical experiment mentioned in the literature (the applied purpose of which seems very recognizable) was carried out in Sweden in the 18th century. analysis of a collection of 90 church hymns that passed state censorship and gained great popularity, but were accused of non-compliance with religious dogmas. The presence or absence of such a correspondence was determined by counting religious symbols in the texts of these hymns and comparing them with other religious texts, in particular the texts of the “Moravian brothers” prohibited by the church. At the end of the 19th - beginning of the 20th centuries. The first content-analytical studies of mass media texts appeared in the United States. Their motivation looks surprisingly familiar: the authors set out to demonstrate the unfortunate “yellowing” of the then New York press. In the 1930s–1940s, studies were carried out that are now recognized as classics of content analysis, primarily the work of G. Lasswell, whose work continued in the post-war years. During the Second World War, perhaps the most famous episode in the history of content analysis took place - the prediction by British analysts of the time when Germany would begin to use V-1 cruise missiles and V-2 ballistic missiles against Great Britain, made on the basis of an analysis of ( together with the Americans) internal propaganda campaigns in Germany.
Since the 1950s, content analysis as a research method has been actively used in almost all sciences that in one way or another practice the analysis of text sources - in the theory of mass communication, in sociology, political science, history and source studies, in cultural studies, literary studies, applied linguistics, psychology and psychiatry. The variety of specific projects implemented over the approximately 70-year history of intensive use of content analysis is very large. Among the interesting projects carried out in recent years in Russia are the study of images and metaphors used in 1996–1997 during the discussion about the national idea that was then unfolding in the Russian press, as well as the analysis of texts of the left-nationalist opposition carried out in the same period. Local content analytical projects are periodically implemented during various kinds of sociological monitoring – national and regional.
Content analysis is most widely used in the theory of mass communication, political science and sociology. This partly explains the fact that sometimes this term is used as a general term for all methods of systematic and purporting to be objective analysis of political texts and texts circulating in channels of mass communication. However, such a broad understanding of content analysis is not justified, since there are a number of research methods either specifically designed for the analysis of political texts (for example, the cognitive mapping method), or applicable and applied for this purpose (for example, the semantic differential method or various approaches involving the study of structure text and the mechanisms of its impact) - which cannot be reduced to standard content analysis even with its broadest understanding.
Nevertheless, content analysis does occupy a special place among analytical methods due to the fact that it is the most technologically advanced among them and therefore most suitable for systematic monitoring of large information flows. In addition, content analysis is flexible enough so that a very diverse range of specific types of research can be successfully “fitted” into its framework. Finally, being fundamentally a quantitative method (albeit containing a considerable qualitative component), content analysis is to a certain extent amenable to formalization and computerization.
TYPES OF INFORMATION ARRAYS AND UNITS OF CONTENT ANALYSIS
The basis of content analysis is counting the occurrence of certain components in the analyzed information array, supplemented by identifying statistical relationships and analyzing structural connections between them, as well as providing them with certain quantitative or qualitative characteristics. It is clear from this that the main premise of content analysis is figuring out what to count; in other words, defining the units of analysis.
These units, depending on the purposes of the analysis, the type of information array, as well as a number of additional reasons, can be (and actually are) very diverse. They are subject to two natural, but, unfortunately, usually poorly compatible requirements. On the one hand, they should be easily and, if possible, unambiguously identified in the text; Ideally, I would like their identification to be algorithmic at all. It is clear that this requirement is best satisfied by formal elements of the text or elements that have clearly defined and unambiguous formal correspondences, for example words.
On the other hand, units of content analysis most often require a certain subjective, and also context-dependent, significance that makes their distribution and the dynamics of such distribution diagnostic for identifying changes in individual and social consciousness, belief systems, etc. – in other words, the units must be interesting for subsequent (political science, cultural studies, sociological, etc.) interpretation. Meanwhile, such units (for example, topics) are of a purely meaningful nature, and their mention in the text can be carried out in many different ways. Their identification in the general case involves semantic analysis of the text, the problem of automation of which, despite many years of efforts by linguists and programmers, is far from being solved.
The characteristics of the units of content analysis must be preceded by a brief consideration of the nature of the analyzed information array. There is nothing in the very definition of the content analysis method that prevents its application to a single text; Moreover, examples of such analysis are known. Nevertheless, there are a number of reasons why the object of content analytical projects is usually not a single text, even a significant one, but an information array or information flow consisting of a large number of texts. Firstly, statistical patterns appear more clearly the larger the sample size. Secondly, most of the goals of content analysis predetermine its tendency towards comparativeness; Analysts are most often interested not in one-time snapshots, but in the dynamics of change, and if they do have slices, then, as a rule, they are “variegated” ones, reflecting, for example, different media or the consciousness of different social groups. Finally, with all the variety of content analysis units discussed below, the most popular are various macro-units: themes and/or problems, propositions, images and ideologemes. There are usually few of these in individual texts and especially in small media texts, and new macro-units do not appear so often, so their dynamics can only be assessed over a long time period or with a broad “horizontal comparison.”
Thus, the idea of content analysis involves the analysis of large information arrays; on the other hand, its relative low cost and manufacturability make such an analysis fundamentally possible. Therefore, it is not surprising that in the history of content analysis there are such projects as the analysis of 427 school textbooks, 481 private conversations, 4022 advertising slogans, 8039 (in 1938) and 19,533 (in 1952) editorial articles, or 15,000 characters in 1,000 hours of television airtime.
The specific variety of content analysis units is almost limitless, but several main types can be distinguished among them. (The classification given below is based on K. Krippendorff’s typology, but differs from it quite significantly.)
Empirical research methods
This type is based on empirical, that is, sensory perception, as well as on measurement using instruments. This is an important component of scientific research in all fields of knowledge from biology to physics, from psychology to pedagogy. It helps to determine the objective laws in accordance with which the phenomena under study occur.
The following empirical research methods in coursework and other student works can be called basic or universal, because they are relevant for absolutely all areas of knowledge.
Studying various sources of information. This is nothing more than a basic collection of information, that is, the stage of preparation for writing a master's thesis or course work. The information you will rely on can be taken from books, the press, regulations and, finally, from the Internet
When searching for information, you should remember that not all finds are reliable (especially on the Internet), therefore, when selecting information, you should treat them critically and pay attention to the confirmation and similarity of materials from different sources. Analysis of the information received. This is the stage that follows the collection of information
It is not enough just to find the necessary material, you also need to carefully analyze it, check for logic, reliability and relevance. Observation. This method is a focused and attentive perception of the phenomenon under study followed by the collection of information. In order for observation to bring the desired results, you need to prepare for it in advance: make a plan, outline factors that require special attention, clearly define the timing and objects of observation, prepare a table that you will fill out during the work. Experiment. If observation is a rather passive research method, then experiment is characterized by your active activity. To conduct an experiment or series of experiments, you create certain conditions in which you place the subject of research. Next, you observe the reaction of the object and record the results of the experiments in the form of a table, graph or diagram. Survey. This method helps you look deeper into the problem being studied by asking specific questions of the people involved. The survey is used in three variations: an interview, a conversation and a questionnaire. The first two types are oral, and the last is written. After completing the survey, you need to clearly formulate its results in the form of text, chart, table or graph.
A. “Physical” units.
These are understood as entities with clearly defined physical, geometric or temporal boundaries, such as, say, copies of a book, newspaper issues, copies of posters or leaflets, photographs, etc. Identifying and counting them is not particularly difficult, but the need for such a count arises quite rarely; Counting, say, leaflets or books is most often carried out with the aim of assessing the representation of some topic or assessment, i.e. units of other, characterized below types of units are actually used - usually conceptual, propositional or thematic.
B. Structural-semiotic units.
By these we mean the basic elements of semiotic systems ( see
. SEMIOTICS). In the case of natural language this is:
– vocabulary of the language (words and their equivalents, for example the expression railway
or the term
content analysis
, i.e. what is recorded in dictionaries) and
– grammatical indicators (for example, negative particles or indicators of categories such as, say, verbal names).
Quantitative calculation of the occurrence of words in a text is perhaps the simplest version of content analysis, which, however, can often produce interesting results. Most often, of course, “interesting” or “key” words and/or phrases are counted, for example, the names of value categories such as freedom
,
stability
,
trust
,
territorial integrity
;
scenarios such as betrayal
or
disappointment
;
fairly unambiguous designations of certain socially significant phenomena, for example, corruption
,
crime
or
terrorism
;
significant attributes like tough
,
decisive
;
emotionally charged evaluative vocabulary such as destructive, uncontrollable, vile, nightmarish, misanthropic
;
password words (also often emotionally charged) such as patriots
,
communofascists
,
mondialists
or
white idiots
;
words that were strongly activated at a particular point in time, like “Family” or “Mabetex” in the early autumn of 1999 or the same “Family” and “Media-Most” at the end of spring 2000 in Russia, terrorism
in many countries of the world in the autumn of 2001, etc. .d.
Content analysis of grammatical categories is a rather rare research endeavor, the impetus for which is the hypothesis (very plausible) that the use of grammatical forms, to a lesser extent than the use of vocabulary, is controlled by the author of the text and therefore can serve as a source of information about him that he himself had no intention of making it available to his readers. In political psychology, there is a special research methodology, the so-called analysis of cognitive complexity, which, based on an actual content-analytic procedure, allows one to draw conclusions about how simple (or, on the contrary, complex) the author of the text’s vision of the political situation is and how it changes over time. The units of content analysis underlying the assessment of cognitive complexity are, for example, categorical quantifiers such as always
,
never
,
every
, which are opposed by quantifiers like
sometimes
,
some
, etc.;
categorical (like the famous unequivocally
) assessments of truth, as opposed to cautious ones,
it is possible
or
not excluded that
;
linguistic means of differentiated consideration of a situation like on the one hand... on the other hand
;
mentions of interaction
,
balance
,
interdependence
,
compromise
, etc.
There are also known examples of content analysis of purely grammatical means, for example, studies of the relationship of verb forms, denoting, respectively, processes and results, the study of nominalized ones (with verbal names such as construction
,
strengthening
, etc.) constructions in the language of party documents of the Brezhnev era, negation in a political text, etc.
Since the objects of content analysis can be not only verbal (natural language) texts, but also other types of texts (for example, cartoons, photographs, advertising clips), visual and audio (most often musical) can be present among the structural-semiotic units of content analysis. images and symbols that can be analyzed on the same basis as units of natural language.
Functional analysis in sociology
Functional analysis is a methodology that is used to explain the operation of a complex system. The basic idea is that the system is viewed as computing a function (or, more generally, solving an information processing problem). Functional analysis suggests that such processing can be explained by decomposing this complex function into a set of simpler functions that are computed by an organized system of subprocesses.
Functional analysis is important to cognitive science because it offers a natural methodology for explaining how information processing occurs. For example, any “black box diagram” proposed as a model or theory by a cognitive psychologist is the result of the analytical stage of functional analysis. Any proposal about what constitutes a cognitive architecture can be seen as a hypothesis about the nature of cognitive functions at the level at which those functions are involved.
B. Conceptual and thematic units.
In most cases, a content analyst is not interested in words as such, and certainly not in grammatical categories, but in the concepts, themes, and problems that are significant to him behind the words - in other words, in what can be called conceptual-thematic units. A researcher interested in what place, say, the problem of crime occupies in the public consciousness is obliged to take into account not only the presence of the word crime
, but also mentions
of contract
and all sorts of other
murders
,
gangster lawlessness
, “
roof
”, “
brothers
”,
authorities
,
criminal power
, etc.
Anyone concerned with the problem of freedom must respond in his analysis to mentions of pressure on the press
,
bureaucratic arbitrariness
,
media control
,
access to the Internet
, etc. Anyone interested in the attitude of public consciousness to some realities must take into account the widest range of positive, negative and some more specific assessments that can be given to these realities, and these assessments do not necessarily have to be present in the form of value judgments.
D. Referential and quasi-referential units.
Referential, or more precisely, specific referential units include designations of real personalities (both modern and historical figures), events, cities, countries, organizations, etc.; this is, so to speak, an “encyclopedic” block of units of analysis. This block, especially in terms of personalities, is very important and diagnostic, since it allows one to determine personal ratings and, what is no less important, to evaluate ideological systems from the point of view of the referent “sign” figures present in them, a kind of “ideological heroes”. An example of an interesting study of the role of referent figures in the Russian opposition discourse of 1996–1997 is the work of A.V. Duka. Methods of designating specific figures in the text may vary ( V.V. Zhirinovsky
,
Vladimir Volfovich
,
Volfych
,
Zhirik
,
the son of a lawyer
,
leader of the LDPR
,
the most pro-Oriental Russian politician
,
the main liberal democrat
,
liberalissimo
), however, the concrete referent unit is the same in all cases.
Quasi-referential units in political texts are most often represented by designations of all kinds of “forces” - collective actors of the political scene, the reference of which can range from the real (such as the Communist Party of the Russian Federation
) through the generalized (
communists
,
liberals
,
the West, Islamists
) to the openly mythologized (
the world behind the scenes
). Regardless of their reference, all these characters are present in the ideological space, actions and assessments can be attributed to them, and the attitude towards them is an important political and ideological factor. The line between quasi-referential and certain types of conceptual-thematic units is blurred due to the fact that some political concepts are capable and even prone (for example, crime) to metaphorical personification.
D. Propositional units and evaluations.
Their examples were given above - Carthage must be destroyed
or
Russia is suffocating without investment
.
Strictly speaking, these are examples of statements that are based on propositions - descriptions of specific states of affairs (situations) regardless of their modality (in the first example - a requirement, in the second - a statement). Along with propositions, assessments can be (and very often are) of great interest for content analysis ( This is a very dangerous decision
). From a logical point of view, they have important differences from propositions, however, for the purposes of content analysis, both the proposition itself and the evaluation can be considered as the result of associating some object with some attribute. The study of the dynamics of value judgments expressed towards certain persons, events, institutions is a very common type of content analytical research.
E. Macrostructural units.
Macrostructural units are understood as rather complex conceptual structures that form the “upper floors” of human ideas about the world and, in particular, ideological systems. These constructions, as a rule, are in the nature of scenarios and describe stereotypical models of development, which are associated with expectations of the future, considerations of the past, emotional associations, etc. Often these designs have literary or folklore prototypes, which is reflected in their names. All of them make very strong claims to explain reality.
The term “ideologeme” is most often used to denote such constructions; in various disciplines they also talk about mythologems, nomadic images, etc. Among such constructions that are present in the public consciousness of modern Russia (and distributed, sometimes bizarrely, according to different ideological systems), there are, for example, the following: Conspiracy, Orgy of Corruption / Criminal Revolution / Mayhem, Robbery / Conversion of power into property, Country of Fools / City of Foolov, “No, guys, it’s not like that,” “Return to Civilization,” etc. Some recently significant ideologemes (say, the Struggle for Power, Natural Decay, or Total Incompetence) have been abandoned in the last one and a half to two years for various reasons. focus of attention of the media, and partly of the population.
G. Units representing the results of conceptual operations.
There are quite a few of them, but the greatest interest for content analysis are metaphors, examples and analogies, which in general terms have already been described above.
Some of the metaphors are actively used in political texts, and their use is considered diagnostic for characterizing both the individual consciousness of the author of the text and the state of public consciousness. For example, in political texts the referenced “military metaphor” is often found in the variant POLITICAL CONFRONTATION IS WAR, manifested in such expressions as the war on poverty
,
a blow to the governor
,
an attack from the opposition
,
a devastating publication
, etc.
When using such a metaphor, political confrontation, regardless of the form in which it is actually waged, is experienced as war, which, by the way, can have consequences for real forms of political interaction. Meanwhile, the “military metaphor” is not the only way to describe the political process (and, more broadly, life in general); they can be described using, for example, a “transport metaphor” and/or the related “path metaphor” ( We have all embarked on a difficult road together
), an “architectural metaphor” (
state building, building a power vertical
) and a number of others. The metaphors of political texts were studied in sufficient detail by J. Lakoff and his followers, including within the framework of content analytical methodology (the work of A.N. Baranov); it has been shown that, for example, an increase in the frequency of military metaphors is one of the correlates of increased tension in society.
No less diagnostic can be the study of the dynamics of examples and analogies - for example, in Russian political texts, until recently, an analogy (owned by V. Yanov) was persistently repeated, within the framework of which Russia was compared with the Weimar Republic.
Where else to analyze content on social networks
Popsters
Price : from 399 ₽ per month for 1 social network.
What is he doing . The service analyzes communities. The tool helps determine the level of engagement of the community audience. You can track which posts attract the audience more or less. Popsters can sort publications by any format, text volume, publication date and content. Reports can be uploaded in six formats: XLSX, PDF, PNG, JPG, CSV and PPTX.
JagaJam
Price : from 5,900 ₽ per month.
What is he doing . The service makes in-depth statistics, analyzes engagement and content quality. The tool finds popular posts and recommends times to post. The service produces statistics on 30 metrics for analyzing brand pages and is considered one of the fundamental tools for analytics in SMM.
Units, categories and characteristics.
Despite the fact that content analysis is basically a quantitative method, as already mentioned, it almost always has a significant qualitative component. In principle, this is true insofar as the units of content analysis, as can be seen from the previous section, are most often still meaningful and their identification is based on semantic (notional) criteria; many of the units represent generalized categories (this applies primarily to themes and ideologemes). In other words, a content analyst engages in quantitative analysis of qualitative categories. But the matter does not end there. In many content analytical projects, not only the degree of representation of certain units in the text is assessed, but also the simultaneous assessment of these units according to certain graduated qualitative scales. In particular, these could be the scales of abstractness (in fact, difficulties for perception) of one or another content proposed by Charles Osgood; distance to the individual (some content components may directly concern the reader or readers, while others may be of only idle interest). In combination with the results of the content analysis itself, the assessment of the used units of analysis (thematic) on the indicated scales gives a three-dimensional scheme of the type, for example, the one proposed by the French cultural scientist A. Mol .
Obviously, other scales can be used in the analysis, in addition, units of content analysis can be combined into various broader categories.
Further actions
Once the categories have been formulated, it is necessary to select an appropriate unit of analysis, such as a piece of content or a linguistic component of speech, that is an indicator of the phenomenon of interest. In many examples of content analysis research, the phenomena under study are most often represented by a word, a simple sentence, a judgment, a topic, an author, a character, a situation in society, a message as a whole, etc.
Complex types of analytical research use not one, but several units. Isolated elements of consideration can be misinterpreted, so they are analyzed as part of expanded linguistic structures that determine the nature of the division of the text within which units of context are identified. For a single word, a sentence is a component of a passage united by a common meaning.
Now you need to set the unit of account. The most commonly used temporal and spatial categories are the number of lines, area, broadcast time, and the frequency of appearance of features in the material.
A prerequisite is the development of a summary table of the content being studied: the main research document, the appearance of which depends on the type of analysis. For example, content analysis in psychology uses a table, which is a system of coordinated and subordinated analytical categories. This is a kind of questionnaire, where each question requires a number of answers that determine the essence of the text.
Using the coding matrix, the results of the analysis are recorded. If the sample size exceeds 100 units, then matrix sheet notebooks are used. When the number of subjects is less than 100, bivariate or multivariate analysis is sufficient. For each text you need to use your own coding matrix, which requires painstaking and time-consuming work. If the sample size exceeds the permissible norm, then a comparison of the key characteristics of the analysis is carried out on a computer.
“Front” and “raid” content analysis.
Content analytical research can be divided into two large classes, which, using the above-mentioned “military metaphor,” can be called frontal and raid. The task of frontal content analytical research is to compile the most complete picture of the information flow - either at a momentary snapshot or over a certain period in order to assess the dynamics. This, so to speak, is an attempt to get an objectified answer to the question “What is being written?” The units of such analysis, in principle, can be anything, but most often they are either thematic units or keywords, less often assessments and propositions, and even less often macrostructural units.
Such analysis is usually of a purely applied nature and is carried out in monitoring mode. Since its goal is to formulate a general idea of the content of the media and, through it, of the public consciousness, it should ideally strive for the widest possible coverage of the information flow. In practice, however, full coverage is most often impossible and often unnecessary. Thus, the problem of compiling a representative sample appears on the agenda of content analytical research - a traditional problem of empirical sociological research, which, if unsuccessfully solved, can completely discredit its results. It is solved in the case of content analysis using traditional sociological methods.
Raid analysis, in contrast to frontal analysis, is focused on solving particular and sometimes quite exotic problems, arising, as a rule, from some research rather than applied interests, and in relation to it the sampling problem is solved in connection with the formulation of these research goals and the definition of units of analysis . The sampling is justified taking into account standard sociological criteria, but may also allow for their violation; it is only important that the fact of this violation is recognized and the need for the violation is justified in a special way.
The need for quantitative analysis
Remark 1
Quantitative text analysis is used when a high degree of accuracy is required when comparing single-order data. The first step in this direction is to identify indicators of key research concepts.
It is advisable to use quantification in content analysis when comparing quantum texts with other quantitative characteristics. Quantitative characteristics must necessarily be associated with qualitative ones, which in the study of documents record the properties and characteristics of the objects identified by the researcher and allow him to analyze the connection of these characteristics with the studied field of activity of the compiler of documentary information.