STATISTICS IN MEASUREMENT AND EVALUATION-I

4.0

INTRODUCTION

In the previous unit, you learnt about psychological testing. In this unit, the discussion will turn towards statistics in measurement and evaluation. Statistics is science of collecting and analysing numerical data in large quantities, particularly for the purpose of inferring proportions in a whole from those in a representative sample.

One can analyse quantitative data through techniques such as measures of central tendency. Measures of central tendency are of various types, such as arithmetic mean, mode and median. This is also commonly known as simply the mean. Even though average, in general, means any measure of central location, when we use the word average in our daily routine, we always mean the arithmetic average.

4.0 UNIT OBJECTIVES

After going through this unit, you will be able to:

· Discuss how data is interpreted

· Describe how data is represented graphically

Statistics in Measurement and Evaluation-I

NOTES

· Discuss the different measures of central tendency

· Illustrate how mean, mode, and median can be calculated

· Explain correlation analysis

· Describe normal probability curve

4.1

STATISTICAL TREATMENT OF DATA

112 Self-Instructional Material

Meaning, importance and steps involved in processing data

Research does not merely consist of data that is collected. Research is incomplete without proper analysis of the collected data. Processing of data involves analysis and manipulation of the collected data by performing various functions. The data has to be processed in accordance with the outline laid down at the time of developing the research plan. Processing of data is essential for ensuring that all relevant data has been collected to perform comparisons and analyses. The functions that can be performed on data are as follows:

· Editing

· Coding

· Tabulation

· Classification

Usually, experts are of the opinion that the exercise of processing and analysing of data is inter-related. Therefore, the two should be thought as one and the same thing. It is argued that analysis of data generally involves a number of closely- related operations, which are carried out with the objective of summarizing the collected data and organizing it in such a way that they are able to answer the research questions associated with it.

However, in technical terms, processing of data involves data representation in a way that it is open to analysis. Similarly, the analysis of data is defined as the computation of certain measures along with searching for the patterns of relationship that may exist among data groups.

Editing of data

Editing of data involves the testing of data collection instruments in order to ensure maximum accuracy. This includes checking the legibility, consistency and completeness of the data. The editing process aims at avoiding equivocation and ambiguity. The collected raw data is also examined to detect errors and omissions, if any. Acareful scrutiny is performed on the completed questionnaires and schedules to assure that the data has the following features:

· Accuracy

· Consistency

· Unity

· Uniformity

· Effective arrangement

The stages at which editing should be performed can be classified as follows:

· Field Editing: This involves reviewing the reporting forms, by the investigator, that are written in an abbreviated or illegible form by the informant at the time of recording the respondent’s responses. Such type of editing must be done immediately after the interview. If performed after some time, such editing becomes complicated for the researcher, as it is difficult to decipher any particular individual’s writing style. The investigator needs to be careful while field editing and restrain the researcher from correcting errors or omission by guesswork.

· Central Editing: This kind of editing involves a thorough editing of the entire data by a single editor or a team of editors. It takes place when all the schedules created according to the research plan have been completed and returned to the researcher. Editors correct the errors such as data recorded in the wrong place or the data recorded in months when it should be recorded in weeks. They can provide an appropriate answer to incorrect or missing replies by reviewing the other information in the schedule. At times, the respondent can be contacted for clarification. In some cases, if the answer is inappropriate or incomplete and an accurate answer cannot be determined on any basis, then the editor should delete or remove that answer from the collected data. He/She can put a note as ‘no answer’ in this case. The answers that can be easily deciphered as wrong should be dropped from the final results.

Besides using the above-stated methods according to the data source, the researcher should also keep in mind the following points while editing:

· Familiarity with the instructions given to interviewers and coders

· Know-how of editing instructions

· Single line striking for deleting of an original entry

· Standardized and distinctive editing of data

· Initialization of all answers that are changed

Coding of data

Coding of data can be defined as representing the data symbolically using some predefined rules. Once data is coded and summarized, the researcher can analyse it and relationships can be found among its various categories.

Checklist for coding

This enables the researcher to classify the responses of the individuals according to a limited number of categories or classes. Such classes should possess the following important characteristics:

· Classes should be appropriate and in accordance to the research problem under consideration.

· They must include a class for every data element.

· There should be a mutual exclusivity, which means that a specific answer can be placed in one and only one cell of a given category set.

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

· The classes should be one-dimensional. This means that every class is defined in terms of only one concept.

Significance of Coding

Coding of data is necessary for its efficient analysis. Codiing facilitates reduction of data from a variety to a small number of classes. Thus, only that information which is important and critical for analysis is retained in the research. Coding decisions are usually taken at the designing stage of the questionnaire. This makes it possible to pre-code the questionnaire choices, which in turn, is helpful for computer tabulation.

However, in case of hand coding, some standard method should be used. One such method is to code in the margin with a coloured pencil. The other method is to transcribe data from the questionnaire to a coding sheet. Whatever method is adopted, you should ensure that coding errors are altogether eliminated or reduced to a minimum level.

Classification of Data

Research studies involve extensive collection of raw data and usage of the data to implement the research plan. To make the research plan easier, the data needs to be classified in different groups for understanding the relationship among the different phases of the research plan. Classification of data involves arrangement of data in groups or classes on the basis of some common characteristics. The methods of classification can be divided under the following two headings:

· Classification according to attributes

· Classification according to class intervals Figure 4.1 shows the categories of data.

Fig. 4.1 Data Classification

114 Self-Instructional Material

Classification of data according to attributes

Data is classified on the basis of similar features as follows:

· Descriptive classification: This classification is performed according to the qualitative features and attributes which cannot be measured quantitatively.

These features can be either present or absent in an individual or an element. The features related to descriptive classification of attributes can be literacy, sex, honesty, solidarity, etc.

· Simple classification: In this classification the elements of data are categorized on the basis of those that possess the concerned attribute and those that do not.

· Manifold classification: In this classification two or more attributes are considered simultaneously and the data is categorized into a number of classes on the basis of those attributes. The total number of classes of final order is given by 2ⁿ, where n = number of attributes considered.

Classification of data according to class intervals

Classifying data according to the class intervals is a quantitative phenomenon. Class intervals help categorize the data with similar numerical characteristics, such as income, production, age, weight, etc. Data can be measured through some statistical tools like mean, mode, median, etc. The different categories of data according to class intervals are as follows:

· Statistics of variables: This term refers to the measurable attributes, as these typically vary over time or between individuals. The variables can be discrete, i.e., taking values from a countable or finite set, continuous, i.e., having a continuous distribution function, or neither. This concept of variable is widely utilized in the social, natural and medical sciences.

· Class intervals: They refer to a range of values of a variable. This interval is used to break up the scale of the variable in order to tabulate the frequency distribution of a sample. A suitable example of such data classification can be given by means of categorizing the birth rate of a country. In this case, babies aged zero to one year will form a group; those aged two to five years will form another group, and so on. The entire data is thus categorized into several numbers of groups or classes or in other words, class intervals. Each class interval has an upper limit as well as a lower limit, which is defined as ‘the class limit.’ The difference between two class limits is known as class magnitude. Classes can have equal or unequal class magnitudes.

The number of elements, which come under a given class, is called the frequency of the given class interval. All class intervals, with their respective frequencies, are taken together and described in a tabular form called the frequency distribution.

Problems related to classification of data

The problems related to classification of data on the basis of class intervals are divided into the following three categories:

(i) Number of classes and their magnitude: There are differences regarding the number of classes into which data can be classified. As such, there are no pre-defined rules for the classification of data. It all depends upon the skill

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

116 Self-Instructional Material

and experience of the researcher. The researcher should display the data in such a way that it should be clear and meaningful to the analyst.

As regards the magnitude of classes, it is usually held that class intervals should be of equal magnitude, but in some cases unequal magnitudes may result in a better classification. It is the researcher’s objective and judgement that plays a significant role in this regard. In general, multiples of two, five and ten are preferred while determining class magnitudes. H.A. Sturges suggested the following formula to determine the size of class interval:

i = R /(1+ 3.3 log N )

where,

i = size of class interval.

R = Range (difference between the values of the largest element and smallest element among the given elements).

N = Number of items to be grouped.

Sometimes, data may contain one or two or very few elements with very high or very low values. In such cases, the researcher can use an open-ended interval in the overall frequency distribution. Such intervals can be expressed below two years; or twelve years and above. However, such intervals are not desirable, yet cannot be avoided.

(ii) Choice of class limits: While choosing class limits, the researcher must determine the mid-point of a class interval. Amid-point is, generally, derived by taking the sum of the upper and lower limit of a class and then dividing it by two. The actual average of elements of that class interval should remain as close to each other as possible. In accordance with this principle, the class limits should be located at multiples of two, five, ten, twenty and hundred and such other figures. The class limits can generally be stated in any of the following forms:

o Exclusive type class intervals: These intervals are usually stated as follows:

• 10–20

• 20–30

• 30–40

• 40–50

These intervals should be read in the following way:

• 10 and under 20

• 20 and under 30

• 30 and under 40

• 40 and under 50

In the exclusive type of class intervals, the elements whose values are equal to the upper limit of a class are grouped in the next higher class. For example, an item whose value is exactly thirty would be put in 30–40- class interval and not in 20–30-class interval. In other words, an exclusive type of class interval is that in which the upper limit of a class interval is

excluded and items with values less than the upper limit, but not less than the lower limit, are put in the given class interval.

o Inclusive Type Class Intervals: These intervals are normally stated as follows:

• 11–20

• 21–30

• 31–40

• 41–50

This should be read as follows:

• 11 and under 21

• 21 and under 31

• 31 and under 41

• 41 and under 51

In this method, the upper limit of a class interval is also included in the concerning class interval. Thus, an element whose value is twenty will be put in 11–20-class interval. The stated upper limit of the class interval 11– 20 is twenty but the real upper limit is 20.999999 and as such 11–20 class interval really means eleven and under twenty-one. When data to be classified happens to be a discrete one, then the inclusive type of classification should be applied. But when data happens to be a continuous one, the exclusive type of class intervals can be used.

(iii) Determining the frequency of each class: The frequency of each class can be determined using tally sheets or mechanical aids. In tally sheets, the class groups are written on a sheet of paper and for each item a stroke (a small vertical line) is marked against the class group in which it falls. The general practice is that after every four small vertical lines in a class group, the fifth line for the element falling in the same group is indicated as a diagonal line through the above said four lines. This enables the researcher to perform the counting of elements in each one of the class groups. Table 4.1 displays a hypothetical tally sheet.

Table 4.1 A Tally Sheet

In case of large inquiries and surveys, class frequencies can be determined by means of mechanical aids, i.e., with the help of machines. Such machines function, either manually or automatically and run on electricity. These machines can sort out

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

cards at a speed of around 25,000 cards per hour. Although this method increases the speed, it is an expensive method.

Tabulation of data

In simple terms, tabulation means placing the results and data collected from research in a tabular form.

Methods of tabulation

Tabulation can be done either manually or mechanically using various electronic devices. Several factors like the size and type of study, cost considerations, time pressures and availability of tabulating machines decide the choice of tabulation. Relatively large data requires computer tabulation. Manual tabulation is preferred in case of small inquiries, when the number of questionnaires is small and they are of relatively short length. The different methods used in hand tabulation are as follows:

• Direct tally method: This method involves simple codes, which the researcher can use to directly tally data with the questionnaire. The codes are written on a sheet of paper called tally sheet and for each response, a stroke is marked against the code in which it falls. Usually, after every four strokes against a particular code, the fifth response is indicated by drawing a diagonal or horizontal line through the strokes. These groups are easy to count and the data is sorted against each code conveniently.

• List and tally method: In this method, code responses may be transcribed into a large worksheet, allowing a line for each questionnaire. This facilitates listing of a large number of questionnaires in one worksheet. Tallies are then made for each question.

• Card sort method: This is the most flexible hand tabulation method, where the data is recorded on special cards that are of convenient sizes and shapes and have a series of holes. Each hole in the card stands for a code. When the cards are stacked, a needle passes through a particular hole representing a particular code. These cards are then separated and counted. In this way, frequencies of various codes can be found out by the repetition of this technique.

Significance of tabulation

Tabulation enables the researcher to arrange data in a concise and logical order. It summarizes the raw data and displays the same in a compact form for further analysis. It helps in the orderly arrangement of data in rows and columns. The various advantages of tabulation of data are as follows:

· A table saves space and reduces descriptive and explanatory statements to the minimum.

· It facilitates and eases the comparison process.

· Summation of elements and detection of omissions and errors becomes easy in a tabular description.

· A table provides a basis for various statistical computations.

Checklist for tables

A table should communicate the required information to the reader in such a way that it becomes easy for him/her to read, comprehend and recall information when required. Certain conventions have to be followed during tabulation of data. These are as follows:

· All tables should have a clear, precise and adequate title to make them intelligible enough without any reference to the text.

· Tables should be featured with clarity and readability.

· Every table should be given a distinct number to facilitate an easy reference.

· The table should be of an appropriate size and tally with the required information.

· Headings for columns and rows should be in bold font letters. It is a general rule to include an independent variable in the left column or the first row. The dependent variable is contained in the bottom row or the right column.

· Numbers should be displayed such that they are neat and readable.

· Explanatory footnotes, if any, regarding the table should be placed directly beneath the table, along with the reference symbols used in the table.

· The source of the table should be indicated just below the table.

· The table should contain thick lines to separate data under one class from the data under another class and thin lines to separate the different subdivisions of the classes.

· All column figures should be properly aligned.

· Abbreviations should be avoided in a table to the best possible extent.

· If data happens to be large, then it should not be crowded in a single table. It makes the table unwieldy and inconvenient.

Tabulation can also be classified as complex and simple. The former type of tabulation gives information about one or more groups of independent variables, whereas, the latter shows the division of data in two or more categories.

Use of statistical tools for analysis

A researcher needs to be familiar with different statistical methods so as to be able to use the appropriate method in his research study. There are certain basic statistical methods that can be classified into the following three groups:

· Descriptive statistics

· Inferential statistics

· Measures of central tendency and dispersion

Descriptive statistics

According to Smith, descriptive statistics is the formulation of rules and procedures where data can be placed in a useful and significant order. The foundation of applicability of descriptive statistics is the need for complete data presentation. The most important and general methods used in descriptive statistics are as follows:

· Ratio: This indicates the relative frequency of the various variables to one another.

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

· Percentage: Percentages (%) can be derived by multiplying a ratio with

100. It is thus a ratio representing a standard unit of 100.

· Frequency table: It is a means to tabulate the rate of recurrence of data. Data arranged in such a manner is known as distribution. In case of a large distribution tendency, larger class intervals are used. This facilitates the researcher to acquire a more orderly system.

· Histogram: It is the graphical representation of a frequency distribution table. The main advantage of graphical representation of data in the form of histogram is that data can be interpreted immediately.

· Frequency polygon: It is used for the representation of data in the form of a polygon. In this method, a dot that represents the highest score is placed in the middle of the class interval. A frequency polygon is derived by linking these dots. An additional class is sometimes added in the end of the line with the purpose of creating an anchor.

· Cumulative frequency curve: The procedure of frequency involves adding frequency by starting from the bottom of the class interval, and adding class by class. This facilitates the representation of the number of persons that perform below the class interval. The researcher can derive a curve from the cumulative frequency tables with the purpose of reflecting data in a graphical manner.

Inferential statistics

Inferential statistics enable researchers to explore unknown data. Researchers can make deductions or statements using inferential statistics with regard to the broad population from which samples of known data has been drawn. These methods are called inferential or inductive statistics. These methods include the following common techniques:

· Estimation: It is the calculated approximation of a result, which is usable, even if the input data may be incomplete or uncertain. It involves deriving the approximate calculation of a quantity or a degree. For example, drawing an estimate of cost of a project; or deriving a rough idea of how long the project will take.

· Prediction: It is a statement or claim that a particular event will surely occur in future. It is based on observation, experience and a scientific reasoning of what will happen in the given circumstances or situations.

· Hypothesis Testing: Hypothesis is a proposed explanation whose validity can be tested. Hypothesis testing attempts to validate or disprove preconceived ideas. In creating hypothesis, one thinks of a possible explanation for a remarked behaviour. The hypothesis dictates that the data selected should be analysed for further interpretations.

There are also two chief statistical methods based on the tendency of data to cluster or scatter. These methods are known as measures of central tendency and measures of dispersion. You will learn about these in the subsequent section.

4.2.1 Interpretation of Data

Analysis of data is the process of transformation of data for the purpose of extracting some useful information, which in turn facilitates the discovery of some useful conclusions. Finding conclusions from the analysed data is known as interpretation of data. However, if the analysis is done, in the case of experimental data or survey, then the value of the unknown parameters of the population and hypothesis testing is estimated.

Analysis of data can be either descriptive or inferential. Inferential analysis is also known as statistical analysis. The descriptive analysis is used to describe the basic features of the data in a study, such as persons, work groups and organizations. The inferential analysis is used to make inferences from the data, which means that we are trying to understand some process and make some possible predictions based on this understanding.

Types of analysis

The various types of analyses are as follows:

· Multiple Regression Analysis: This analysis is used to predict the single dependent variable by a set of independent variables. In multiple regression analysis, independent variables are not correlated.

· Multiple Discriminant Analysis: In multiple discriminant analysis, there is one single dependent variable, which is very difficult to measure. One of the main objectives of this type of analysis is to understand the group differences and predict the likelihood that an entity, i.e., an individual or an object belongs to a particular class or group based on several metric independent variables.

· Canonical Analysis: It is a method for assessing the relationship between variables. This analysis also allows you to investigate the relationship between the two sets of variables.

Univariate, Bivariate and Multivariate Analyse

There are also many types of analyses performed according to the variance that exists in data. They are carried out to check if the differences among three or more variables are so significant that data has to be evaluated statistically. There are three types of such analyses; namely, univariate, bivariate and multivariate analyses. These types are discussed as follows:

· Univariate Analysis: In this analysis, only a single variable is taken into consideration. It is usually the first activity pursued while analysing data. It is performed with the purpose of describing each variable in terms of mean, median or mode, and variability. Examples of such analysis are averages or a set of cases that may come under a specific category amidst a whole sample.

· Bivariate Analysis: This type of analysis examines the relationship between two variables. It tries to find the extent of association that exists between these variables. Thus, a bivariate analysis may help you, for example, to find whether the variables of irregular meals and migraine headaches are

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

122 Self-Instructional Material

associated. It may also help to find the extent to which they may be associated. Here, two variables are thus statistically measured simultaneously.

· Multivariate Analysis: This type of analysis involves observation and analysis of three or more than three statistical variables at a time. Such an analysis is performed using statistical tests or even a tabular format. Thus, for example, using the multivariate analysis method, you can study the variables of age, educational qualification and annual income of a given set of population at the same time.

Usually, these types of analyses are more convenient when performed in a tabular format. Multivariate analysis involves, using a cross-classification or contingency table. Such a table is made of two columns and two rows, showing the frequencies of two variables that are displayed in rows and columns. This is more popularly known as constructing the bivariate table. Traditionally, the independent variable is displayed in columns and the dependent one in rows. Amultivariate table, if related to the same data, is the result of combining the bivariate tables. In this case, each bivariate table is known as a partial table. Usually, a multivariate table is created with the purpose of explaining or replicating the primary relationship that is found in the bivariate table. Table 4.2(a) and (b) shows an example of a bivariate table and a multivariate table.

Table 4.2 (a) Bivariate Table

	1991	1992	1993
Percentage of students failed	33 per cent	38 per cent	42 per cent
Percentage of students passed	67 per cent	62 per cent	58 per cent

Table 4.2 (b) Multivariate Table

	1991	1992	1993
	First Attempt	Second Attempt	Third Attempt
Percentage of students who passed in Maths	27 per cent	35 per cent	–
Percentage of students who passed in English	53 per cent	60 per cent	44 per cent

Although the data in both tables is related, as the variable of attempts is distinct, the multivariate table has been displayed separately in this example. However, you should note that the tables have dealt simultaneously with two or more variables of the data.

Data interpretation

Data interpretation refers to the identification of trends in different variables. The researcher uses statistics for this purpose. The researcher is required to be familiar with the knowledge of the scales of measurement. This enables him/her to choose

the appropriate statistical method for his/her research project. The scales of measurement facilitate the allotment of numerical values to characteristics adhering to any specific rules. This measurement is also related to such levels of measurement of data like nominal, ordinal and internal and ratio levels. These levels can be explained as follows:

· Nominal measurement: The nominal measurement assigns a numerical value to a specific characteristic. It is the fundamental form of measurement. The nominal measurement calculates the lowest level of data available for measurement.

· Ordinal measurement: This type of measurement involves allotting a specific feature to a numerical value in terms of a specific order. The ordinal scale displays the way in which the entity is measured. The ordinal scale of measurement is used to calculate and derive data pertaining to the median, percentage, rank order, correlations and percentile.

· Interval measurement: A researcher can depict the difference between the first aspect of a data and its another aspect using this level of measurement. The interval scale of measurement is useful for the researcher in several ways. It can be applied to calculate arithmetic mean, averages, standard deviations and to determine the correlation between different variables.

· Ratio measurement: In this method, there are fixed proportions (ratios) between the numerical and the amount of the characteristics that it represents. A researcher should remember while measuring the ratio levels that, there exists a fixed zero point. The ratio level of measurement facilitates researchers in determining if the aspects of ratio measurement possess any certain characteristic. Almost any type of arithmetical calculations can be executed using this scale of measurement.

The most important feature of any measuring scale is its reliability and validity, which is explained as follows:

· Reliability: This term deals with accuracy. A scale measurement can be said to be reliable, when it exactly measures only that which it is supposed to measure. In other words, when the same researcher repeats a test, i.e., with a different group but resembling the original group, he/she should get the same results as the former.

· Validity: According to Leedy, validity is the assessment of the soundness and the effectiveness of the measuring instrument. There are several types of validity, which include the following:

o Content Validity: It deals with the accuracy with which an instrument measures the factors or content of the course or situations of the research study.

o Prognostic Validity: It depends on the possibility to make judgements from results obtained by the concerned measuring instrument. The judgement is future oriented.

o Simultaneous Validity: This involves comparison of one measuring instrument with another; one that measures the same characteristic and is available immediately.

Statistics in Measurement

and Evaluation-I

NOTES

Self-Instructional Material 123

CHECK YOUR PROGRESS

1. What are the functions that can be performed on data?

2. How can coding of data be defined?

3. What is descriptive statistics?

4. When is univariate analysis performed?

Statistics in Measurement and Evaluation-I

NOTES

4.2 FREQUENCY DISTRIBUTION AND GRAPHIC REPRESENTATION OF DATA

Data is referred to as recorded description of the observations of an individual or group of individuals. Examples of data may be the marks of students achieved on any test, recorded observation regarding the availability of various types of flowers in the park, recorded rates of various types of vegetables available in the local market, prices of different types of laptop made by various companies, etc.

Data may be collected from the following two important sources:

1. External sources: It may be divided into the following three categories:

(i) Primary sources: These are the sources, which are original in nature, i.e., the data is collected from the real object regarding which the research is going on.

(ii) Secondary sources: These are the sources that use data collected through the primary sources.

(iii) Tertiary sources: These sources filter data from secondary sources.

2. Internal sources: The data collected or created by the organization itself are called internal sources. Large organizations generally create its own data. For example, the Central government generates or collects large volume of data on population through census, data related to enrolment of students in various levels of schooling, literacy rate at states and national levels and so on. All reports submitted by various committees and commissions of government constitute internal data.

Nature of Data

Data may be numerical like 2, 4, 150 and 6, and so on, or categorical, for example, republican, democrat, etc. As there is no hierarchy in the categories, it is not ordinal in nature. Numerical data may be discrete or continuous in nature. The discrete data is created either from a finite number or a countable number of possible values. On the other hand, continuous data is created from infinite number of possible values, which may be presented in the form of points on a continuous scale having no gaps or interruptions on it.

In other words, it may be said that the discrete data can be counted whereas continuous data cannot be counted. Some of the examples of discrete data are

numbers of cars or motorcycles in the parking, fatal airline accidents occurred in India, number of enrolled students in a course, cows grazing in a field, etc. As all these can be counted, these data can be called discrete data. Examples of continuous data are weight of a person, height of an individual or a child, speed of a car, temperature measured by thermometer, quantity of water coming out of tape, etc. All these data cannot be measured with absolute precision.

Numerical data may be classified into cardinal and ordinal. In cardinal data, the absolute difference between the given two numbers is meaningful (e.g., 10 miles, 8 miles, 6 miles, 2 miles). Notice that there is a difference of two miles between the first two numbers i.e. 10 miles and 8 miles. It may be said that 10 miles is one and a quarter times more than the 8 miles, similarly, 6 miles is 2 times more that 2 miles. The ordinal data gives a comparative picture among its various values. In this case, the absolute difference between two numbers can only convey order and not the absolute difference as that of cardinal data. For example, P is either larger than Q, or equal to Q or less than Q.

Scales of measurement

Measurement is the process of assigning numbers or symbols or grades to objects or events in a well-designed manner. There are four scales of measurements, listed as follows:

1. Nominal scale

2. Ordinal scale

3. Interval scale

4. Ratio scale

1. Nominal scale: Statistically, this scale is the lowest measurement scale or level used by us in data analysis. In nominal scale, we simply place data into various categories. It has no order or structure. For example, the Yes/No or True/False scale used in data collection and analysis is nominal scale in which there are two categories. It does not have any order and there is no distance between these two categories, i.e., YES and NO as well as between True and False. The statistics, which we can apply with the data collected or available in nominal scales are put in the non-parametric group. Examples that come into the category of non-parametric group are mode, cross tabulation with chi-square, etc.

2. Ordinal scale: This scale comes next to the nominal scale in terms of power of measurement. Ranking is the simplest ordinal scale. In ordinal scale, place the given data into various categories in terms of rank or power or considering some particular trait or value or quality. For example, when you rank the given five types of fruits from most tasty to least tasty depending on our own taste or depending upon some pre- determined criteria, you are basically creating an ordinal scale of preference with regard to the taste of given fruits. There is never an absolute or objective distance between any two points or rank in this scale. It is subjective in nature and varies from person to person. Ordinal scales interpret gross order only and not the relative positional

Statistics in Measurement

and Evaluation-I

NOTES

Statistics in Measurement and Evaluation-I

NOTES

distances between or among various ranks. This scale uses non-parametric statistics for analysis and interpretation of the data. Some of the examples of statistics used for this scale are as follows:

a. Median and mode

b. Rank order correlation

c. Non-parametric analysis of variance

3. Interval scale: This scale assumes and keeps an equal distance between two consecutive points or elements being presented on the scales of measurement. This is the scale in which the interpretation of differences in the distance along the scale is possible and comprehendible. It is different from an ordinal scale. In ordinal scale, you can only present differences in order of the various categories or points but not the differences in the degree of the order. In interval scales, you can define by metrics such as logarithms. In all these cases, the distances between various points on the sale are not equal but they are strictly definable based on the metric system used. In interval scale, parametric statistical techniques are used for data analysis. Some of the examples of the statistics used are as follows:

· Mean

· Standard deviation

· Correlation – r

· Regression

· Analysis of variance (ANOVA)

· Factor analysis

· Full range of advanced multivariate and modeling techniques

Note: You can use non-parametric techniques with interval as well as ratio data, but non-parametric tests are less powerful than the parametric tests.

4. Ratio scale: The ratio scale is the top level scale of measurement. It is generally not available in the field of social science research. One of the most important salient features of ratio scale is that it has a true zero point. The simplest and easily comprehensible example of ratio scale is the measurement of distance or length (disregarding any philosophical points about the method of defining and identifying zero distance). It may be differentiated from that of the interval and ratio scale in an easy way. For example, the centigrade scale of temperature has a zero point, which is arbitrary in nature. Fahrenheit scale has an equivalent of this point at -32^o (0 ^o in centigrade scale is equal to

-32^o in Fahrenheit scale). Hence, even though temperature appears to be in ratio scale, in reality, it is an interval scale. Similar to the interval scale data, ratio scale data also uses parametric statistical techniques. The examples of these techniques are as follows:

· Mean

· Standard deviation

· Correlation – r

· Regression

· Analysis of variance (ANOVA)

· Factor analysis

· Full range of advanced multivariate and modeling techniques

4.3.1 Presentation of Data in Sequence: Grouping, Tabulation and Graphical Representation

After the collection of data, it requires to be presented in a systematic form so that it can be managed properly. Raw data gives no full meaning until it is arranged properly; hence, it is very important to present data in arranged form. There are two methods of presenting data. These are:

1. Grouping or tabular presentation

2. Graphical and diagrammatic presentation

Grouping and Tabulation of Data

Grouping and tabular presentation of data refers to the presentation of data in tables in a systematic organization of groups, i.e., in rows and columns.

Raw data

The mass of data in its original form as collected from the field or through experiment is called as raw data. This is the data that have not been arranged in any numerical form. It is simply collected data and have not been treated at all. Each single item in the raw data may be in the form of score, value, number, grade, observation, etc. An example of raw data is given here. This data depicts the scores of students of class IX on a test in English language having maximum marks of 20.

13, 12, 13, 12, 11, 14, 12, 11, 12, 15, 14, 10, 14, 13, 12, 13, 13, 12, 10, 11, 11, 12, 13, 14,

11, 12, 13, 14, 15 and 12.

Array

When the mass of collected raw data is arranged in an order of magnitude, may be in increasing or decreasing order, it is called as an array. The preceding raw data is arranged in the following array:

10, 10, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 14,

14, 14, 14, 14, 15, and 15.

Frequency distribution

The commonest way of organizing raw data is to form a frequency distribution. It may be done in the form of tally mark and frequencies (Table 4.3) and grouped frequency distribution (Table 4.4).

In tally marks and frequency table (Table 4.4) just fill the numbers in column 1 in ascending or descending order. Fill tally marks in the second column for all numbers or scores equivalent to the numbers it appears in the distribution. Tally mark for the score five may be shown by drawing four lines and crossing it. Write the total frequency of that score in the third column. This type of frequency distribution is used when the data is small in size.