STATISTICS IN MEASUREMENT AND EVALUATION-I
4.0
![]() |

In the previous unit, you learnt about psychological testing. In this unit, the discussion will turn towards statistics in measurement and evaluation. Statistics is science of collecting and analysing numerical data in large quantities, particularly for the purpose of inferring proportions in a whole from those in a representative sample.
One can analyse quantitative data through techniques such as measures of central tendency. Measures of central tendency are of various types, such as arithmetic mean, mode and median. This is also commonly known as simply the mean. Even though average, in general, means any measure of central location, when we use the word average in our daily routine, we always mean the arithmetic average.
4.0
UNIT OBJECTIVES
After going through this unit, you will be able to:
·
Discuss how data is interpreted
· Describe how data is represented graphically
Statistics in Measurement and Evaluation-I
NOTES
·
Discuss the different measures of central tendency
·
Illustrate how mean, mode, and median can be calculated
·
Explain correlation analysis
·
Describe normal probability curve
4.1

STATISTICAL TREATMENT
OF DATA

112 Self-Instructional Material
Meaning, importance and steps involved
in processing data
Research does not merely consist of data that is collected. Research is incomplete without proper analysis of the collected data. Processing of data involves analysis and manipulation of the collected data by performing various functions. The data has to be processed in accordance with the outline laid down at the time of developing the research plan. Processing of data is essential for ensuring that all relevant data has been collected to perform comparisons and analyses. The functions that can be performed on data are as follows:
·
Editing
·
Coding
·
Tabulation
·
Classification
Usually, experts are of the opinion that the exercise of processing and analysing of data is inter-related. Therefore, the two should be thought as one and the same
thing. It is argued that analysis of data generally involves a number of closely-
related operations, which are carried
out with the objective of summarizing the collected
data and organizing it in such a way that they are able to answer the research
questions associated with
it.
However, in technical terms, processing of data involves data representation in a way that it is open to analysis. Similarly, the analysis of data is defined as the computation of certain measures along with searching for the patterns of relationship that
may exist among
data groups.
Editing of data
Editing of data involves the testing of data collection instruments in order to ensure maximum accuracy. This
includes checking the legibility, consistency and completeness of the data. The editing process
aims at avoiding
equivocation and ambiguity. The collected raw data is also examined to detect errors and omissions, if any. Acareful scrutiny is performed on the completed questionnaires and schedules to assure that the data has the following features:
·
Accuracy
·
Consistency
·
Unity
·
Uniformity
·
Effective arrangement
The stages
at which editing should be performed can be classified as follows:
·
Field
Editing: This involves reviewing the reporting forms, by the investigator, that are written in an abbreviated or illegible form by the informant
at the time of recording the respondent’s responses. Such type of editing
must be done immediately after the interview. If performed after some time, such editing becomes complicated for the researcher, as it is difficult to decipher any particular individual’s writing style. The investigator
needs to be careful while field editing and restrain the researcher from correcting errors or omission by guesswork.
·
Central Editing: This kind of editing involves a thorough editing of the entire data by a single editor or a team of editors. It takes place when all the schedules created according to the research plan have been completed and returned to the researcher. Editors
correct the errors
such as data
recorded in the wrong place or the data recorded in months when it should be recorded in weeks. They can provide an appropriate answer to incorrect or missing replies by reviewing the other information in the schedule. At times, the respondent can be contacted for clarification. In some cases, if the answer is inappropriate or incomplete and an accurate answer cannot be determined on any basis, then the editor should delete or remove that answer from the collected data. He/She can put a note as ‘no answer’ in this case. The answers
that can be easily deciphered as wrong should be dropped
from the final
results.
Besides using the above-stated methods
according to the data source,
the researcher should also keep in mind the following
points while editing:
·
Familiarity with the instructions given to interviewers and coders
·
Know-how of editing instructions
·
Single line striking for deleting of an original entry
·
Standardized and distinctive editing of data
·
Initialization of all answers that are changed
Coding of data
Coding of data can be defined as representing the data symbolically using some predefined rules. Once data is coded and summarized, the researcher can analyse it and relationships can be found among its various categories.
Checklist for coding
This enables the researcher to classify the responses of the individuals according to a limited number of categories or classes. Such classes should possess the following important characteristics:
·
Classes should be appropriate and in accordance to the research
problem under consideration.
· They must include a class for every data element.
·
There should be a mutual exclusivity, which means that a specific answer
can be placed in one and only one cell of a given category
set.
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
·
The classes should be one-dimensional. This means that every class is defined in
terms of only
one concept.
Significance of Coding
Coding
of data is necessary for its efficient analysis. Codiing facilitates reduction of data from a variety to a small number of classes. Thus, only that information which is important and critical for analysis is retained in the research. Coding decisions are usually taken at the designing stage of the questionnaire. This makes it possible to pre-code the questionnaire choices, which in turn, is helpful for computer tabulation.
However, in case of hand coding,
some standard method
should be used.
One such method is to code in the margin with a coloured pencil. The other method is to transcribe data from the questionnaire to a coding sheet. Whatever method is adopted, you should ensure that coding errors are altogether eliminated or reduced to a minimum
level.
Classification of Data
Research studies involve extensive collection of raw data and usage of the data to implement the research plan. To make the research plan easier, the data needs to be classified in different groups for understanding the relationship among the different phases of the research plan. Classification of data involves arrangement of data in groups or classes
on the basis of some common characteristics. The methods of classification can be divided under the following two headings:
· Classification according to attributes
·
Classification according to class intervals Figure 4.1 shows the categories of
data.
Fig. 4.1 Data Classification
114 Self-Instructional Material
Classification of data according to attributes
Data is classified on the basis of similar features as follows:
·
Descriptive classification: This classification is performed according to the qualitative features and attributes which cannot be measured quantitatively.
These features
can be either present or absent in an individual or an element. The features related to descriptive classification of attributes can be literacy, sex,
honesty, solidarity, etc.
·
Simple
classification: In this classification the elements of data are categorized on the basis
of those that possess the concerned attribute
and those that do not.
·
Manifold classification: In this classification two or more attributes are considered
simultaneously and the data is categorized into a number of classes on the basis of those attributes. The total number of classes of final order is given by 2n, where n = number of attributes
considered.
Classification of data according
to class intervals
Classifying data according to the class intervals is a quantitative phenomenon. Class intervals help categorize the data with similar numerical characteristics, such as income,
production, age, weight, etc. Data can be measured through some statistical tools like mean, mode, median, etc. The different
categories of data according to class intervals are as follows:
·
Statistics
of variables: This term refers to the measurable attributes, as these typically vary over time or between individuals. The variables can be discrete, i.e., taking values
from a countable or finite
set, continuous, i.e.,
having a continuous distribution function, or neither. This concept of variable is widely utilized in the social, natural and medical sciences.
·
Class intervals: They refer to a range of values of a variable. This interval is used to break up the scale of the variable in order to tabulate the frequency distribution of a sample. A suitable example of such data classification can be given by means of categorizing the birth rate of a country. In this case, babies aged zero to one year will form a group; those
aged two to five years
will form another
group, and so on. The entire data is thus categorized into several numbers of groups or classes
or in other words, class intervals. Each class interval has an upper limit as well as a lower limit, which is defined as ‘the class limit.’ The difference between two
class limits is known as class magnitude. Classes can have equal or unequal class magnitudes.
The number of elements, which come under a given class, is called the frequency of the given class interval. All class intervals, with their respective frequencies, are taken together and described in a tabular
form called the frequency distribution.
Problems related
to classification of data
The problems related
to classification of data on the basis of class intervals are divided into
the following three
categories:
(i)
Number of classes and their magnitude: There are differences regarding the number of classes into which data can be classified. As such, there are no pre-defined rules for the classification of data. It all depends upon the skill
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
116 Self-Instructional Material
and experience of the researcher. The researcher should display the data in such a way that it should be clear and meaningful to the analyst.
As regards the magnitude of classes, it is usually held that class intervals should be of equal magnitude, but in some cases unequal magnitudes may result in a better classification. It is the researcher’s objective and judgement that plays a significant role in this regard. In general, multiples of two, five and ten are preferred while determining
class magnitudes. H.A. Sturges suggested
the following formula to determine the size of class interval:
i = R /(1+ 3.3 log N )
where,
i = size of class interval.
R = Range (difference between the values of the largest element and smallest element among the given elements).
N = Number of items to be grouped.
Sometimes, data may contain one or two or very few elements with very high or very low values. In such cases, the
researcher can use an open-ended interval
in the overall frequency distribution. Such intervals can be expressed below
two years; or twelve years and above. However, such intervals are not desirable, yet cannot be avoided.
(ii)
Choice of class limits: While choosing
class limits, the researcher must
determine the mid-point of a class interval. Amid-point is, generally, derived by taking the sum of the upper and lower limit of a class and then dividing it by two. The actual average of elements of that class interval should remain as close to each other as possible. In accordance with this principle, the class limits should be located at multiples of two, five, ten, twenty and hundred and such other figures. The class limits can
generally be stated in any of the following forms:
o Exclusive type class intervals:
These intervals are usually stated as follows:
•
10–20
•
20–30
•
30–40
•
40–50
These intervals should be read in the following way:
•
10 and under 20
•
20 and under 30
•
30 and under 40
•
40 and under 50
In the exclusive type of class intervals, the elements whose values are equal to the upper limit of a class are grouped
in the next higher class. For example, an item whose value
is exactly thirty
would be put in 30–40-
class interval and not in 20–30-class interval. In other words, an exclusive type of class interval is that in which the upper limit of a class interval is
excluded and items with values less than the upper limit, but not less than the lower limit, are put in the given class interval.
o Inclusive Type Class Intervals: These intervals are normally stated as follows:
•
11–20
•
21–30
•
31–40
•
41–50
This should be read as follows:
•
11 and under 21
•
21 and under 31
•
31 and under 41
•
41 and under 51
In this method, the upper limit of a class
interval is also included in the concerning class interval. Thus, an element whose value is twenty will be put in 11–20-class interval. The stated upper limit of the class interval 11– 20 is twenty but the real upper limit is 20.999999 and as such 11–20 class interval really means eleven and under
twenty-one. When data to be classified
happens to be a discrete one, then the inclusive type of classification should be applied. But when data happens to be a continuous one, the exclusive type of class intervals can be used.
(iii)
Determining
the frequency of each class: The frequency of each class can be determined using tally sheets or mechanical aids. In tally sheets, the class groups are written
on a sheet of paper and for each item a stroke
(a small vertical
line) is marked
against the class group in which it falls. The general practice
is that after every four small vertical lines in a class group, the fifth line for the element falling in the same group is indicated as a diagonal line through the above said four lines. This enables the researcher to perform the counting of elements in each one of the class groups. Table 4.1 displays a hypothetical tally sheet.
Table 4.1 A Tally Sheet
![]() |
In case of large inquiries and surveys, class frequencies can be determined by means of mechanical aids, i.e., with the help of machines. Such machines function, either manually or automatically and run on electricity. These machines can sort out
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
cards at a speed of around 25,000 cards per hour. Although this method increases the speed,
it is an expensive method.
Tabulation of data
In simple terms, tabulation means placing the results and data collected from research in a tabular form.
Methods of tabulation
Tabulation can be done either manually or mechanically using various electronic devices. Several factors like the size and type of study, cost considerations, time pressures
and availability of tabulating machines decide the choice of tabulation. Relatively large data requires computer tabulation. Manual tabulation is preferred in case of small inquiries, when the number of questionnaires is small and they are of relatively short length. The different methods used in hand tabulation are as follows:
• Direct tally method:
This method involves simple codes, which the
researcher can use to directly
tally data with the questionnaire. The codes are written on a sheet of paper
called tally sheet and for each response, a stroke is marked against
the code in which it falls. Usually, after every four strokes against a particular code, the fifth response is indicated by drawing a diagonal or horizontal line through the strokes. These groups are easy to count and the data is sorted against each code conveniently.
• List and tally method: In this method, code responses may be transcribed into a large worksheet, allowing a line for each questionnaire. This facilitates listing of a large number of questionnaires in one worksheet. Tallies are then made for each question.
• Card sort method: This is the most flexible hand tabulation method, where the data is recorded on special cards that are of convenient sizes and shapes and have a series of holes. Each hole in the card stands for a code. When the cards are stacked, a needle passes through
a particular hole representing a particular code. These cards are then separated and counted. In this way,
frequencies of various codes can be found out by the repetition of this technique.
Significance of tabulation
Tabulation enables the researcher to arrange data in a concise and logical order. It summarizes the raw data
and displays the same in a compact form for further analysis. It helps in the orderly arrangement of data in rows
and columns. The various
advantages of tabulation of data are as follows:
·
A table saves space and reduces descriptive and explanatory statements to the minimum.
· It facilitates and eases the comparison process.
·
Summation of elements and detection of omissions and errors becomes easy in
a tabular description.
· A table provides a basis for various statistical computations.
Checklist for tables
A table should communicate the required information to the reader in such a way that it becomes easy for him/her to read, comprehend and recall information when required. Certain conventions have to be followed during tabulation of data. These are as follows:
·
All tables should have a clear, precise and adequate title to make them intelligible enough without any reference to the text.
·
Tables should be featured with clarity and readability.
·
Every table should be given a distinct number to facilitate an easy reference.
·
The table should be of an
appropriate size and tally with the required
information.
·
Headings for columns and rows should be in bold font letters. It is a general rule to include an independent variable in the left column or the first row. The dependent variable is contained in the bottom row or the right column.
· Numbers should be displayed such that they are neat and readable.
·
Explanatory footnotes, if any, regarding the table should be placed directly beneath the table, along with the reference symbols used in the table.
· The source of the table should be indicated just below the table.
·
The table should contain thick lines to separate data under one class from the data under another class and thin lines to separate the different subdivisions of
the classes.
· All column figures should be properly aligned.
· Abbreviations should be avoided in a table to the best possible extent.
·
If data happens to be large, then it should not be crowded in a single table. It makes the table unwieldy and inconvenient.
Tabulation can also be classified as complex and simple. The former type of tabulation gives information about one or more groups of independent variables, whereas,
the latter shows the division of data in two or more categories.
Use of statistical tools for analysis
A researcher needs to be familiar with different statistical methods so as to be able to use the appropriate method in his research study. There are certain basic statistical methods that can be classified into the following three groups:
· Descriptive statistics
· Inferential statistics
·
Measures of central tendency and dispersion
Descriptive statistics
According to Smith, descriptive statistics is the formulation of rules and procedures where data can be
placed in a useful and significant order. The foundation of applicability of descriptive statistics is the need for complete data presentation. The most important and general methods used in descriptive statistics are as follows:
·
Ratio: This indicates the relative frequency of the various variables to
one another.
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
·
Percentage: Percentages (%) can be derived by multiplying a ratio with
100. It is thus a ratio representing a standard unit of 100.
·
Frequency table: It is a means to tabulate the rate of recurrence of data. Data arranged in such a manner is known as distribution. In case of a large distribution tendency, larger class
intervals are used. This facilitates the researcher to acquire a more orderly system.
·
Histogram: It is the graphical representation of a frequency distribution table. The main advantage of graphical representation of data in the form of histogram is that data can be interpreted immediately.
·
Frequency polygon: It is used for the representation of data in the form of a polygon. In this method, a dot that represents the highest score is placed in the middle of the class interval. A
frequency polygon is derived by linking these dots. An additional class is sometimes added in the end of the line with the purpose of creating an anchor.
·
Cumulative frequency curve: The procedure of frequency involves adding frequency by starting from the bottom of the class interval, and adding class by
class. This facilitates the representation of the number
of persons that perform
below the class interval. The researcher can derive a curve from the cumulative frequency tables with the purpose of reflecting data in a graphical manner.
Inferential statistics
Inferential statistics enable researchers to explore unknown data. Researchers can make deductions or statements using inferential statistics with regard to the broad population from which samples of known data has been drawn. These methods are called inferential or inductive statistics. These methods include the following common techniques:
·
Estimation: It is the calculated approximation of a result, which is usable, even if the input data may be incomplete or uncertain. It involves deriving the approximate calculation of a quantity or a degree. For example, drawing an estimate of cost of a project; or deriving a rough idea of how long the project will take.
·
Prediction: It is a statement or claim that a particular event will surely occur in future. It is based on observation, experience and a scientific reasoning of what will happen in the given circumstances or situations.
·
Hypothesis Testing: Hypothesis is a proposed explanation whose validity can be tested. Hypothesis testing attempts to validate or disprove preconceived ideas. In creating hypothesis, one
thinks of a possible explanation for a remarked
behaviour. The hypothesis dictates that the data selected should be analysed for further interpretations.
There are also two chief statistical methods based on the tendency of data to cluster or scatter. These methods are known as measures of central tendency
and measures of dispersion. You will learn about these in the subsequent section.
4.2.1 Interpretation of Data
Analysis of data is the process of transformation of data for the purpose of extracting some useful information, which in turn facilitates the discovery of some useful
conclusions. Finding conclusions from the analysed data is known as interpretation of data. However, if the analysis is done, in the case of experimental data or survey, then the value of the unknown parameters of the population and hypothesis testing is estimated.
Analysis of data can be either descriptive or inferential. Inferential analysis is also known as statistical analysis. The descriptive analysis is used to describe the basic features of the data in a study, such as persons, work groups and organizations. The inferential analysis is used to make inferences from the data, which means that we are trying to
understand some process and make some possible predictions based
on this understanding.
Types of analysis
The various types of analyses are as follows:
·
Multiple Regression Analysis:
This analysis is used to predict the single dependent variable by a set of independent variables. In multiple regression analysis, independent variables are not correlated.
·
Multiple Discriminant Analysis: In multiple discriminant analysis, there is one single dependent variable, which is very difficult to measure. One of the main objectives of this type of analysis is to understand the group differences and predict the likelihood that an entity, i.e., an individual or an object belongs to a particular class or group based on several metric independent variables.
·
Canonical Analysis: It is a method for assessing the relationship between variables. This analysis also allows you to investigate the relationship between the two sets of variables.
Univariate, Bivariate
and Multivariate Analyse
There are also many types
of analyses performed according to the variance that
exists in data. They are carried out to check if the differences among three or more variables are so significant that data has to be evaluated statistically. There are three types of such analyses; namely, univariate, bivariate and multivariate analyses. These types are discussed as follows:
·
Univariate Analysis:
In this analysis,
only a single variable is taken into consideration. It is usually the first activity pursued while analysing data. It is performed with the purpose
of describing each variable in terms of mean, median or mode, and variability. Examples of such analysis are averages or a set of cases that may come under a specific category amidst a whole sample.
·
Bivariate Analysis: This type of analysis examines the relationship between two variables. It tries to find the extent
of association that exists between
these variables. Thus, a bivariate analysis may help you, for example, to find whether
the variables of irregular meals
and migraine headaches
are
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
122 Self-Instructional Material
associated. It may also help to find the extent to which they may be associated. Here, two variables are thus statistically measured simultaneously.
·
Multivariate Analysis: This type of analysis involves observation and analysis of three or more than three statistical variables at a time. Such an analysis is performed using statistical tests or even a tabular format. Thus, for example, using the multivariate analysis method, you can study the variables of age, educational qualification and annual income of a given set of population at the same time.
Usually, these types of analyses are more convenient when performed in a tabular format.
Multivariate analysis involves, using a cross-classification or contingency table. Such a table is made of two columns and two rows, showing the frequencies of two variables that are displayed in rows and columns. This is more popularly known as constructing the bivariate table. Traditionally, the independent variable is displayed in columns and the dependent one in rows. Amultivariate table, if related to the same data, is the result
of combining the bivariate tables.
In this case, each bivariate table is known as a partial table. Usually, a multivariate table is created with the purpose of explaining or replicating the primary relationship that is found in the bivariate table. Table 4.2(a) and (b) shows an example of a bivariate table and a multivariate table.
Table 4.2 (a) Bivariate Table
|
1991 |
1992 |
1993 |
Percentage of students failed |
33 per cent |
38 per
cent |
42 per
cent |
Percentage of students passed |
67 per cent |
62 per
cent |
58 per
cent |
Table 4.2 (b) Multivariate Table
|
1991 |
1992 |
1993 |
|
First Attempt |
Second Attempt |
Third Attempt |
Percentage of students who
passed in Maths |
27 per cent |
35 per cent |
– |
Percentage of students who
passed in English |
53 per cent |
60 per cent |
44 per cent |
Although the data in both tables is
related, as the variable of attempts is distinct,
the multivariate table has been displayed separately in this example. However, you should note that the tables have dealt simultaneously with two or more variables of the data.
Data interpretation
Data interpretation refers to the identification of trends in different variables. The researcher uses statistics for this purpose. The researcher is required to be familiar with the knowledge of the scales of measurement. This enables him/her to choose
the appropriate statistical method for his/her research
project. The scales of measurement facilitate the allotment of numerical values to characteristics adhering to any specific rules. This measurement is also related to such levels of measurement of data like nominal, ordinal and internal and ratio levels. These levels can be explained as follows:
·
Nominal measurement: The nominal measurement assigns a numerical value to a specific characteristic. It is the fundamental form of measurement. The nominal measurement calculates the lowest level of data available for measurement.
·
Ordinal measurement: This type of measurement involves allotting a specific feature to a
numerical value in terms of a specific
order. The ordinal
scale displays the way in
which the entity is measured. The ordinal scale of measurement is used to calculate and derive data pertaining to the median, percentage, rank order, correlations and percentile.
·
Interval
measurement: A researcher can depict the difference between the first aspect of a data and its another aspect using this level of measurement. The interval scale of measurement is
useful for the researcher in several ways.
It can be applied to calculate arithmetic mean, averages, standard deviations and to determine the correlation between different variables.
·
Ratio measurement: In this method,
there are fixed proportions (ratios)
between the numerical and the amount of the characteristics that it represents. A researcher should remember while measuring the ratio levels that,
there exists a fixed zero point.
The ratio level of measurement facilitates researchers in determining if the aspects of ratio
measurement possess any certain characteristic. Almost any type of arithmetical calculations can be executed using
this scale of measurement.
The most important feature of any measuring scale is its reliability and validity, which is explained as follows:
·
Reliability: This term deals with accuracy. A scale measurement can be said to be reliable, when it exactly measures only that which it is supposed to measure. In other words, when
the same researcher repeats a test, i.e., with a different
group but resembling the original group, he/she should get the same results as the former.
·
Validity: According to Leedy, validity is the assessment of the soundness and the effectiveness of the
measuring instrument. There are several types of validity,
which include the following:
o Content Validity: It deals with the accuracy with which an instrument measures the factors or content of the course or situations of the research study.
o Prognostic Validity: It depends on the possibility to make judgements from
results obtained by the concerned measuring instrument. The judgement is future oriented.
o Simultaneous Validity: This involves comparison of one measuring instrument
with another; one that measures the same characteristic and is available immediately.
Statistics in Measurement
and Evaluation-I
NOTES
Self-Instructional
Material 123
|
NOTES
![]() |
4.2
FREQUENCY DISTRIBUTION AND GRAPHIC REPRESENTATION OF DATA
Data is referred to as recorded description of the observations of an individual or group of individuals. Examples of data may be the marks of students achieved on any test, recorded observation regarding the availability of various types of flowers in the park, recorded rates of various
types of vegetables available in the local
market, prices of different types of laptop made by various companies, etc.
Data may be collected from the following two important sources:
1.
External sources: It may be divided into the following three categories:
(i)
Primary sources: These are the sources, which are original
in nature, i.e., the data is collected from the real object regarding which the research is going on.
(ii)
Secondary sources: These are the sources that use data collected through the primary sources.
(iii)
Tertiary
sources: These sources filter data from secondary sources.
2. Internal sources: The data collected or created by the organization itself are called internal sources. Large organizations generally create its own data. For example, the Central government generates or collects large volume of data on population
through census, data related to enrolment of students in various levels
of schooling, literacy
rate at states and national levels and so on. All reports
submitted by various
committees and commissions of government constitute internal data.
Nature of Data
Data may be numerical like 2, 4, 150 and 6, and so on, or categorical, for example, republican, democrat, etc. As there is no hierarchy in the categories, it is not ordinal in nature. Numerical data may be discrete or continuous in nature. The discrete data is created either from a finite number or a countable number of possible values. On the other hand, continuous data is created from infinite number of possible values, which may be presented in the form of points on a continuous scale having no gaps or interruptions on it.
In other words, it may be said that the discrete data can be counted whereas continuous data cannot be counted. Some of the examples of discrete data are
numbers
of cars or motorcycles in the parking, fatal
airline accidents occurred
in India, number
of enrolled students in a course,
cows grazing in a field,
etc. As all these can be counted,
these data can be called discrete data. Examples of continuous data are weight of a person, height of
an individual or a child, speed of a car, temperature measured by thermometer, quantity of water coming out of tape, etc. All these data cannot be measured with absolute precision.
Numerical data may be classified into cardinal and ordinal. In cardinal data, the absolute
difference between the given two numbers
is meaningful (e.g., 10 miles,
8 miles, 6 miles, 2 miles). Notice that there is a difference of two miles between the first two numbers i.e. 10 miles and 8 miles. It may be said that 10 miles is one and a quarter times more than the 8 miles, similarly, 6 miles is 2 times more that 2 miles. The ordinal data gives a comparative picture among its various values. In this case, the absolute difference between two numbers can only convey
order and not the absolute
difference as that of cardinal
data. For example,
P is either larger
than Q, or equal to Q or less than Q.
Scales of measurement
Measurement is the process of assigning numbers or symbols or grades to objects or events in a well-designed manner. There are four scales of measurements, listed as follows:
1.
Nominal scale
2.
Ordinal scale
3.
Interval scale
4.
Ratio scale
1. Nominal scale: Statistically, this scale is the lowest measurement scale or level used by us in data analysis. In nominal scale, we simply place data into various categories. It has no order or structure. For example, the Yes/No or True/False scale used in data collection and analysis is nominal scale in which there are two categories. It does not have any order and there is no distance between these two categories, i.e., YES and NO as well as between True and
False. The statistics, which we can apply with the data collected or available in nominal scales are put in the non-parametric group. Examples
that come into the category of non-parametric group are mode, cross tabulation with chi-square, etc.
2. Ordinal scale: This scale comes next to the nominal scale in terms of power of measurement. Ranking is the simplest ordinal scale. In ordinal scale, place the given data into various categories in terms of rank or power or considering some particular trait or value or quality.
For example, when you rank the given five types of fruits from most tasty to least tasty depending on our own taste
or depending upon some pre- determined criteria,
you are basically creating
an ordinal scale of preference with regard to the taste of given fruits. There is never an absolute or
objective distance between any two points
or rank in this scale. It is subjective in nature and varies from person to person. Ordinal scales interpret gross order only and not the relative positional
Statistics in Measurement
and Evaluation-I
NOTES
Statistics in Measurement and Evaluation-I
NOTES
distances between or among various ranks. This scale uses non-parametric statistics for analysis and interpretation of the data. Some of the examples of statistics used for this scale are as follows:
a.
Median and mode
b.
Rank order correlation
c.
Non-parametric analysis of variance
3. Interval scale: This scale assumes and keeps an equal distance
between two consecutive
points or elements being presented on the scales of measurement. This is the scale in which the interpretation of differences in the distance along the scale is possible and comprehendible. It is different
from an ordinal scale. In ordinal scale, you can only present
differences in order of the various categories or points but not the differences in the degree of the order. In interval scales, you can define by metrics such as logarithms. In all these cases, the distances between
various points on the sale are not equal
but they are strictly definable based on the metric system used. In interval scale,
parametric statistical techniques are used for data analysis. Some of the examples of the statistics used are as follows:
· Mean
· Standard deviation
· Correlation – r
· Regression
· Analysis of variance (ANOVA)
· Factor analysis
· Full range of advanced multivariate and modeling techniques
Note: You can use non-parametric techniques with interval as well as ratio data, but non-parametric tests are less powerful than the parametric tests.
4. Ratio scale: The
ratio scale is the top level scale of measurement. It is generally not available in the field of social science research. One of the most important salient features of ratio scale is that it has a true zero point.
The simplest and easily
comprehensible example of ratio scale is the measurement of distance or length (disregarding any philosophical points about the method of defining and identifying zero distance). It may be differentiated from that of the interval and ratio scale in an easy way.
For example, the centigrade scale of temperature has a zero point, which is arbitrary in nature. Fahrenheit scale has an equivalent of this point at -32o (0 o in centigrade scale is equal to
-32o in Fahrenheit scale). Hence, even though temperature appears to be in ratio scale, in reality, it is an interval scale. Similar to the interval scale data, ratio scale data also uses parametric statistical techniques. The examples of these techniques are as follows:
· Mean
· Standard deviation
· Correlation – r
· Regression
·
Analysis of variance (ANOVA)
·
Factor analysis
·
Full range of advanced multivariate and modeling techniques
4.3.1
Presentation of Data in Sequence: Grouping,
Tabulation and Graphical Representation
After the collection of data, it requires to be presented in a systematic form so that it can be managed
properly. Raw data gives no full meaning
until it is arranged properly; hence, it is very important to present data in arranged form. There are two methods
of presenting data.
These are:
1.
Grouping or tabular presentation
2.
Graphical and diagrammatic presentation
Grouping and Tabulation of Data
Grouping and tabular presentation of data refers to the presentation of data in tables in a systematic organization of groups, i.e., in rows and columns.
Raw data
The mass of data in its original form as collected from the field or through experiment is called as raw data. This is the data that have not been arranged in any numerical form. It is simply collected data and have not been treated at all. Each single item in the raw data may be in the form of score, value, number, grade, observation, etc. An example of raw data is given here. This data depicts the scores of students of class IX on a test in English language having maximum marks of 20.
13, 12, 13, 12, 11, 14, 12, 11, 12, 15, 14, 10, 14, 13, 12, 13, 13, 12, 10, 11, 11, 12, 13, 14,
11, 12, 13, 14, 15 and 12.
Array
When the mass of collected raw data is arranged in an order of magnitude, may be in increasing or decreasing order, it is called as an array. The preceding raw data is arranged in the following array:
10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14,
14, 14, 14, 14, 15, and 15.
Frequency distribution
The commonest way of organizing raw data is to form a frequency distribution. It may be done in the form of tally mark and frequencies (Table 4.3) and grouped frequency distribution (Table 4.4).
In tally marks and frequency table (Table 4.4) just fill the numbers in column 1 in ascending or descending order. Fill tally marks in the second column for all numbers or scores equivalent to the numbers
it appears in the distribution. Tally mark for the score five may be shown by drawing
four lines and crossing it. Write the total frequency of that score in the third column. This type of frequency distribution is used when the data is small
in size.
Komentar
Posting Komentar