Reliability of assessment instruments

Reliability is the ability of a test to achieve consistent and stable results. Essentially, it would be expected to produce similar results (scores) for the same candidate for each instance when it is completed within a specified time frame. Reliability comes into question when these conditions are not met. Thus, to ensure consistency, both tools will be examined using the approaches detailed below.

Alternate or Parallel Forms Reliability

(Competency to communicate effectively in writing, written exam)

The alternate forms reliability measures the association between two equivalent versions of an assessment instrument which involves using differently worded items to measure the same attributes. In order for two forms to be parallel, they must contain the same components such as the number of questions, test the same content, and have the same response format. By employing this testing method, two separate but similar tools would be used to assess the competency of written communication which would be administered to a group of individuals within a specific period of time. While it is anticipated that each respondent would receive a similar score on both tests, the results would be calculated to determine if there is an appropriate correlation which determines reliability.

The approaches for determining reliability of the written assessment would first require the creation of a series of questions intended to evaluate the communication competency which would then be randomly divided to create two distinct but similar tests. It would also be necessary to ensure that the individuals selected to complete both assessments are representative of the employees who potentially would occupy these positions. The cohort could be a random sample of current or former supervisors of varying age and gender who would have various levels of functional knowledge and expertise about the role. Once ideal candidates are determined, they would be randomly divided into two groups and would take both tests at dissimilar times. Their responses would be tabulated and scored based a standard rating guide. It is anticipated that the findings between both sets of tests would result in similar scores.

However, to ensure the results are accurately interpreted, the correlation coefficient could be calculated using the Pearson’s correlation. While Pearson’s r can range from -1 to 1, an overall score closer to 1, would indicate a positive correlation between the results of the two assessments, demonstrating high alternate forms reliability. If less than optimal results were achieved, it would be prudent to determine if

there were any sources of error such as test administration issues, dichotomous scoring or if the clarity or relevance of the questions themselves should come into question.

While the creation of two sets of assessment tools can be a costly and arduous exercise, using the Alternate Forms Reliability can be advantageous when there is a requirement to assess numerous candidates being considered for a position. Specifically, it reduces the extent to which individuals can attempt to memorize test content, should they be required to take the examination on more than one occasion. Its strength lies in the idea that the alternate test looks different but measures the same construct with content that is equivalent in level of difficulty and composition.