Assessment and Evaluation

Assessment Terms Glossary - General Terms

 

Assessment Glossary

With the recent emphasis on assessment, “assessment-related” terms are becoming even more commonly used by educators.  Unfortunately, there is often a lack of clarity regarding these terms because of an absence of commonly agreed upon definitions.  In order to facilitate discussion of assessment within the district, it is vital that we have a common set of definitions that we can use.  This glossary has been compiled with that goal in mind.

 

General Terms

 

Assessment - The process of gathering information (both quantitative and qualitative).  In education, assessment covers a range of processes used to determine or estimate what students know and can do and how much they have learned.  Assessment can include tests, student learning demonstrations, teacher observations, professional judgment, and other indicators such as graduation rates and surveys.

 

Formative Assessment (Assessment FOR Learning) – All activities undertaken by teachers and their students that provide information to be used as feedback:

·        to adjust instruction to support additional learning,

·        to guide and support student learning, and

·        to support the closing of gaps in learning.

 

Summative Assessment (Assessment OF Learning) – Assessments used to determine how much students have learned at a particular point in time in order to report achievement status.

[NOTE: Notice that the terms “formative” and “summative” refer to how data is USED (whether data is used for adjusting learning/teaching or for evaluating learning). Since assessments are tools designed for a specific purpose, formative or summative goals should be understood and accounted for in assessment designs.]

 

Common Formative Assessment: An assessment or set of assessment items created collaboratively by a team of teachers responsible for the same grade level or course. Assessment data is used to identify: students who need additional support, effective teaching strategies, specific areas in which students are having difficulty, and improvement goals for the teachers and the team. (adapted from Learning By Doing, Dafour et. al. 2006)

 

In the book entitled, On Common Ground, Rick Stiggins has this to say about common assessments.

“In addition, as a result of this teamwork-based learning experience, teachers can continue to collaborate in the development and use of both assessment OF and FOR learning.  To the extent that we team to (1) analyze, understand, and deconstruct standards, (2) transform them into high-quality classroom assessments, and (3) share and interpret results together, we benefit from the union of our wisdom about how to help our students continue to grow as learners.  Just be cautious and understand that common assessment OF learning may not constitute assessments FOR learning if they do not satisfy the conditions of student involvement spelled out here.   And we must always remain open to the possibility that assessments FOR learning may be unique to a single classroom or even to a single student – and are therefore not always “common.”  But to the extent that teachers can work together to meet the challenges of classroom assessment, we bring the power of the professional community into play to benefit students.”

 

Assessment bias – qualities of an assessment instrument that offend or unfairly penalize a group of examinees because of examinees’ gender, ethnicity, socioeconomic status, religion, or other such group-defining characteristics.

 

Offensiveness – Offensiveness generally occurs when negative stereotypes of certain subgroup members are presented in an assessment.  Other types of offensiveness include slurs, blatant or implied, based on stereotypic negatives about how members of particular groups behave.  Finally, offensiveness can occur when the language used in an assessment isn’t inclusive (such as using the term men instead of people).

 

Unfair penalization – Occurs when a student’s test performance is distorted because of content that, although not offensive, disadvantages the student because of the student’s group membership.

 

Differential (disparate) impact - When members of a particular subgroup perform less well than other students taking the same assessment. Subgroups may be defined in terms of race, ethnicity, gender, socio-economic status, or any other variable that represents identifiable differences. The presence of disparate impact does not necessary imply that assessment bias is present; however, the presence of disparate impact does suggest that further scrutiny of the test or test items may be warranted.
 

Essential Outcome: A ‘big idea’ we want students to carry forward when they have let go of some of the details of their learning. This may be thought of as a 'linchpin"--something that is essential for students to understand and hold onto in order to connect their learning.

 

Evaluation – Using data to form conclusions and make judgments.  Teachers evaluate when they use data gathered from assessments to grade students. Evaluators use data from multiple assessments to make conclusions about strengths and weaknesses of educational programs.

 

Portfolio - A way of collecting information for one or more of the following uses: (1) to showcase student work, (2) to describe student performance, or (3) to evaluate student performance.  The term portfolio can refer to both the process associated with collecting information and the product itself, the collection. Stop this definition here (space issues) The key characteristics of effective portfolio systems are: (1) authenticity of instructional activities and assessments, (2) on-going assessment that is aligned with curriculum and instruction, (3) inclusion of assessments that focus on process as well as product, (4) use of assessment results to document growth, (5) collaboration between student and teacher, (6) student self-reflection and evaluation, and

(7) supports communication.

 

Reliability – When we assess students we want to generate scores that are consistent.  In educational assessment there are actually four types of consistency:  (1) stability over different assessment occasions, (2) consistency of results among two or more different forms of an assessment, (3) consistency in the way an assessment’s items function, and (4) consistency between scores assigned by two different raters.

 

Standards: Themes of specific learning objectives that are related to specific content or processes.

 

Performance indicator - A comprehensive description of the overt behaviors (observable performances) that indicate the presence of specific knowledge and/or skills.  A performance indicator is generally a collection of critical factors that allows us to assess what students have attained.

 

Performance Level Descriptors - Similar in nature to performance indicators, the State DOE uses this term in the assessment portfolio process to talk about student performance at various levels of proficiency.
Examples: Elementary report card rubrics and rubrics used with some of the CRT performances. Secondary GDE rubrics for math and writing. Secondary rubrics for Language Arts CRTs.

Performance Standard - The specific performance/product/achievement that sets the criteria for performance on the task in question. Specifies what a student must do and to what degree of mastery.

 

Critical Attributes/Critical Factors -  The key traits or features that characterize performance at a given level. The key traits should be observable features of the skill and knowledge students are expected to possess.

 

Standardized tests - Tests that are administered and scored under conditions uniform to all students (test-takers).  Standardization is a generic concept that can apply to any testing method - from multiple-choice to written essays to performance assessments.  Standardization makes scores comparable and assures, to the extent possible, that test-takers have equal chances to demonstrate what they know.

 

Criterion-referenced tests (CRTs) - Standardized tests that compare a student’s performance to clearly identified learning tasks or skill levels.  The basis for comparison is to a body of content knowledge and skills.

 

Norm-referenced tests (NRTs)- Standardized tests that compare a student’s performance to that of other test-takers.  Norms are obtained by administering the test (under the same conditions) to a given sample (drawn from the population of interest, called the norm group) and then calculating standard scores.

 

Tasks/Items – Individual questions (or tasks) on an assessment. These terms are generally used interchangeably.  Traditionally, the term ‘items’ is used in conduction with paper-and-pencil assessment, whereas the term ‘tasks’ is associated with performance assessment.

 

Validity – Validity refers to the degree to which our score-based inferences about students are defensible.  Another way to think about validity is to pose the question, “Am I measuring what I think I am measuring?”  When we assess students, we take these students’ responses to a set of tasks or items and generate some type of score that summarizes the students’ performances. 


Published: April 5, 2007, Updated: April 5, 2007