Of Carts and Horses in the Methodology of Programme Evaluation: 'effects' and 'effectiveness' in New Zealand's Teacher In-service Programmes in Information Technology.

Paper presented at AQR '99 Association for Qualitative Research International Conference. Melbourne, Australia. 8-10 July 1999.

Dr. Vince Ham
Christchurch College of Education
New Zealand

 

Abstract

 

Since 1991, successive Governments and many schools in New Zealand, as elsewhere, have invested large amounts of money on new computer based information technologies (IT). Inspiring this investment seem to have been a widely held public perception that developing technological competence, preparing children for their future in the 'information age', has become almost as vital an aspect of schooling as reading and writing, and a widely held professional perception that new technologies have an important role to play as tools for learning 'across the curriculum'. Proponents of both imperatives, and not least teachers themselves, have tended to see the professional development of teachers as one, if not the, key factor in achieving such goals, and since the Sallis Report of 1990, it has been the policy of successive New Zealand governments to commit public funds extensively to the provision of professional development programmes for teachers.

After eight years of continuous, but increasingly multivariate, provision of professional development, the questions arise of how, and how successfully, the delivery models operate? And by whose lights are we to judge? This paper reports the progress of the pilot for a proposed two year multiple case study of such programmes in action. By studying three selected in-service programmes, all sharing the same primary aim of supporting the integration of IT into teachers’ classroom work, but each drawing on a different structural model of professional development delivery, the pilot focused on how various participants and stakeholders themselves judged the effectiveness of their in-service training in IT. Beyond that, the paper seeks to ground these perceptions within the apparently clashing discourses of public (‘stakeholder’) and professional (‘participant’) perceptions of the place and purpose of new information technologies in our classrooms, and ultimately within a methodological debate on what might constitute an ‘authentic’ process evaluation of in-service provision for teachers.

Introduction

Over the last two years the New Zealand Government has announced a significant injection of money into a national "ICT Strategy for Schools". During the period 1998-2001, some $85 million is to be spent on several centrally funded initiatives, all with the general aim of improving the technical and professional infrastructure for the integration of Information and Communications Technologies (ICT) into schools' teaching and learning programmes. In this national Strategy by far the greatest proportion of money has been tagged for programmes for the professional development (PD) of teachers. The nature of these PD programmes have not been centrally prescribed. Rather, schools have been asked to submit their own proposals for programmes of professional development for the Ministry to fund.

The Christchurch College of Education gained the contract to provide an independent evaluation of the various professional development programmes thus proposed, as they are implemented by schools over the next three years. However, while the members of the College team responsible for the evaluation all had considerable experience as facilitators and managers of professional development programmes for teachers in IT, and there had always been a significant component of formal and informal 'self-evaluation' of these, our own programmes, we were relative novices as 'independent' evaluators of the programmes of others. This paper identifies three of the central conceptual and methodological dilemmas faced by the evaluation team and their resolution during the piloting of the evaluation in 1998.

Cart and Horse No. 1: Product and Process in the Evolving Theory of Evaluation?

Conversation at a research study group seminar, Oxford University, UK. December 1997. (Almost verbatim)

Author: The problem I have is that you get the impression from reading the literature that the nature of the in-service in this area is often taken for granted, seen as unproblematic. The teachers always seem to like their INSET. ..They all say it was great, evaluate it positively; almost as if any old INSET would have done, ... but then they don't all put it into practice in the classroom. So maybe the obstacles and barriers are not to do with the type of in-service but in the school, access to equipment, or -

Colleague: Hang on a minute! ...If the INSET didn't lead to change in what teachers do in the classroom, then its the INSET that is the problem... it was the wrong kind of INSET... Simple as that.

The classic definition of evaluation in education, and one possibly undergoing a resurgence in the new age discourse of purchased PD programmes, provider accountability, and value for money, is the measurement of outcomes in comparison with goals. As Tyler (1950) defined it: evaluation is the process of determining to what extent the educational objectives are actually being realised. (quoted in Nevo, 1989, p.16 ). Evolving theories of programme evaluation in education over the last two decades, however, have taken a more comprehensive view of what an evaluation should be and therefore, by implication, the range or types of data to be collected and the modes of analysis to be used.

Highlighting the difficulty of observing intervention effects which are defined only in terms of measurable behaviours, and of knowing precisely what such behaviours or outcomes may or may not be the effects of, recent writers have much expanded the objectives-outcomes approach to evaluation, incorporating a greater focus on the process in between the objectives and the outcomes. Stake (1967), for example, emphasises that the evaluators task is twofold - both description and judgement - and that therefore three bodies of information need to be studied: the antecedents (everything that might have a bearing on outcomes), the transactions (what actually happens during a programme or educational event), and the outcomes for all participants. Stake also emphasises that the purposes of evaluation may be formative (to improve an ongoing activity) or summative (to hold activities or educators to account). Stufflebeam (1974, in Henderson 1978) follows Stake in stressing the need to collect comprehensive transactional data as well as a knowledge of the aims and objectives, though he transfers the responsibility for judgement from the evaluator to the decision maker or sponsor, identifying the primary purpose of evaluation as being to inform decision making.

Scriven (1967 in Henderson 1978), on the other hand, rejects the notion of goals as the defining origin of educational evaluation on the grounds that it tends to assume that goals derive from the providers of programmes rather than its consumers. Advocating what he calls goal-free evaluation, Scriven argues that needs are a better starting point, emphasising that programmes should derive from and meet the articulated needs of teachers as consumers rather than the imposed goals of programme sponsors. Parlett and Hamilton's (1977) concept of illuminative evaluation takes this a step further. Goal-, or even needs- based evaluations tend not to allow for changes in orientation as a result of the programmes themselves. Participants goals, aims, needs, perspectives even, change during and as a result of the programme being evaluated. Like Stake and Stufflebeam they draw from a participant observation tradition in sociology rather than the effects-measurement tradition of the behavioural sciences, and argue that the main purpose of evaluation is illuminative: that is, it should, indeed can do no more than, shed light on the instructional dynamics of the programme itself. The evacuator's task is to unravel the complex scene he encounters; isolate its significant features; delineate cycles of cause and effect; and comprehend the relationship between organisational patterns and the responses of individuals. (Quoted in Henderson 1978, p.68) In a sense the judgement aspect of evaluation becomes a judgement about what it is, rather than a judgement about how well it measures up as a set of outcomes. This is also the conclusion reached by the advocates of action research such as Kemmis (1989), who fly a flag for an extreme democratic perspective which sees the role of evaluator as not really involving judgement at all, but merely the facilitation of self-judgement in participants. Simons talks of this as the educative and emancipatory role for evaluation(1987, p.53).

MacDonald's well known description of evaluation as being either autocratic, bureaucratic or democratic, and Weis's (1989) reconceptualisation of 'stakeholder evaluation', emphasise the political dimension of evaluation. They characterise evaluations in large part according to whose purposes or interests are served (Henderson, 1978; Newton, 1993; Simons, 1987). A stakeholder approach emphasises the need to systematically gather information about each stakeholder's concerns and criteria for success, and to provide rich information of perceived relevance to all those with an investment in the programme. Weiss (1989, 1989a) notes that stakeholder evaluation has not been successfully applied at the macro level, but at least conceptually it addresses some of the criticisms made of outcome based theories of evaluation. In a revision of its original purpose (to inform decision making) she notes that stakeholder evaluation is more likely to be effective and appropriate if the evaluation is more illuminative than goal-oriented in nature. The concepts of 'stakeholders' and 'participants' developed for our own evaluation is somewhat broader than Weiss's and includes almost everyone with some direct interest in a given PD programme. However, in the context of the study, the political and stakeholder models discussed by MacDonald and Weiss served as a reminder that every programme can be, and usually is, evaluated from a variety of perspectives using a variety of not necessarily compatible criteria.

Such models, thus, emphasise the point that judging the effectiveness of a programme neither rests on nor prioritises the presuppositions, goals or criteria of any one particular participant or interest group. Rather it derives from, and perhaps can only consist of, an understanding and enunciation of the perspectives and interests of all of them. In the context of INSET provision, McBride (1989), MacLure (1989) and others have questioned whether INSET as it has traditionally been evaluated takes sufficient account of teacher needs, focussing as it does on institutional goals: Is the system institution- or teacher-centred? (McBride 1989, p.1). But perhaps such pleas for a purely teacher focus also buy into an exclusivist polarisation of benefit which assumes that the requirements of one group of interested parties can only be met at the expense of another. This may well be so where the interests in some way conflict. But the more important point in terms of method and procedure is surely that the more comprehensive is the data on which an evaluation is based, the more likely such internal conflicts of interest are to surface, the more accurately an observer can judge which, and whose, needs are actually being addressed, and the more valid an evaluation is therefore likely to be as a process of knowing. As Stake and Denny put it, evaluation is an investigation of worth, not just effect.

"Considered broadly, evaluation is the discovery of the nature and worth of something. In relation to education, we may evaluate, students, teachers, curriculums, administrators, systems, programs and nations. The purposes for evaluation may be many, but always evaluation attempts to describe something and to indicate its perceived merits and shortcomings... Evaluation is not a search for cause and effect, an inventory of present status, or a prediction of future success. It is something of all of these things but only as they contribute to understanding substance, function and worth. "(quoted in Kemmis 1989, p.117-118)

It is possibly for this reason that Newton (1993), explicitly, and Henderson (1978), implicitly, argue that Stake's model of evaluation is the most appropriate for teacher professional development, in that it encompasses and accommodates most of the other perspectives, since it focuses on the key task of seeking and describing all of the various elements of congruence and incongruence among all aspects of a programme. Questioning, at least as an issue of data collection or procedure, whether the division between process evaluation and goal evaluation is even useful in regard to in-service programmes, Newton argues that Stakes model can accommodate both:

[The]... evaluation of process has a significant role to play in INSET evaluation, and therefore needs to be part of any model which guides our approach.... Stake's conceptualisation of the evaluation task, which gives equal weight to the collection and interpretation of transactional data, (that is, data concerned with the activities and interactions of the training process), alongside information relating to outcomes and antecedents, offers a relevant and practical framework. (p. 20)

In summary, then, a goal-oriented model of evaluation, with its primary strategy of pre-post measurement, has been challenged by a more comprehensive conceptualisation of evaluation. This more comprehensive model emphasises the need to take into account the influence of all the potentially different and conflicting goals and needs of all the various stakeholders; the need to collect rich information on the transactions involved in any programme being evaluated and not just its measurable aims and effects; and the need to acknowledge the organic, self-mutating nature of programmes as that affects participants' changing perspectives and knowledge, as well as their subsequent actions.

The implications of this for method in our study were threefold:

• first, it provided a rationale for a multiple case study as an appropriate research strategy. Depth and rich information gathered over time about many specific, comparable, social interactions for a few comparable groups of participants/stakeholders is required, rather than breadth or a search for typicality across many cases through, for example, a survey.

• secondly, it posed important questions for the pilot itself: how, exactly, might one conduct such a process evaluation and defend it as a legitimate piece of research? What would such an evaluation look like in terms of data collection and analysis techniques in the context of PD programmes in IT?

• thirdly, it provided a conceptual framework within which participants' talk and actions, their own latent assumptions or theories of evaluation, the implicit criteria by which they as participants judge the programmes, could be interpreted and categorised.

If evaluation is conceived as description with value added, and not merely as the measurement of consequences against goals, then the projected study's broad aim became to provide such a description, as a necessary backdrop against which any evaluation of the effectiveness of programmes, from any stakeholder's perspective, might more fruitfully occur. This in turn meant that it would not only have to identify the various participants' and stakeholders' perceived and evolving goals or needs, but also the extent of congruence between these expressed goals and what may be observed to happen during the progress of the PD programmes.

It also became apparent during the literature review phase that although these process questions might be necessary, prior and no less important questions to those of outcome or consequence, they were questions which the research into the effectiveness of in-service programmes in IT generally failed to ask. Overall the amount of rigorous empirical research done on the effectiveness of teacher professional development in IT is not extensive, and the impression created by that which has been conducted is that findings so far are at best anomalous and at worst contradictory or methodologically challengeable. Much of what is published in the area could be characterised as self-reported reflection or informed commentary, rather than as systematically planned research. Many projects on the implementation of 'IT' or 'technology' in schools are reported as success stories, and many attribute that success to the in-service programme(s) employed. But very few report anything about the in-service programmes themselves, other than the briefest outlines of their outward form - location, length and, sometimes, content. Very few, in other words, make the in-service event itself the object of research. Rather they tend to focus exclusively on presumably consequent teacher behaviours in the classroom, often basing their judgements solely on what teachers say about the value or effects of professional development rather than what they do during or after it. Moreover, it seems, the question 'what is it that facilitators of IT do when they facilitate professional development in IT?' is asked even less than the associated question of 'what it is that teachers do when they are facilitated?'.

Cart and Horse No. 2: Interviews and Observations in the Pilot Methodology

"Observation is always selective. It needs a chosen subject, a definitive task, an interest, a point of view, a problem. And it presupposes a descriptive language, with property words; it presupposes similarity and classification, which in turn presupposes interest, points of view, and problems." (Popper, 1963, cited in Evertson and Green, 1986, p.164)

Clearly the final evaluation was going to involve both extensive observations of professional development events, and extensive interviewing of participants and stakeholders in those events. But the exact relationship between the interviews and observations in the pilot were initially problematic. The 'traditional' relationship for programme leaders or facilitators evaluating their own programmes is to act first and ask questions later; to carry out the programme first and only later, occasionally half way through as well but more often only at the end, to ask participants' what they thought about it. This clearly would be inadequate for the proposed 'independent' evaluation, and although we were determined not to impose, before the event, so great a structure on the observations that they would be reduced to a series of frequency counts and time series calculations of pre-conceived indicators of significance, we would at the very least have to develop some general characteristics and coding categories for our proposed semi-structured observations of these events.

Joyce & Showers (1988), Bolam (1997) and others have identified model characteristics of effective PD provision, and it was initially tempting to use these as coding categories for the interview data, identifying statements within the interviews and slotting them into one or other of these ideal characteristics in what Yin (1994) calls the process of "analytic generalisation" (p.30). However, it was felt that this would have been a self-fulfilling distortion of the data to try and fit it into the ideal shape. It would not have been a true test of these models as evolving theories, nor would it be an effective way of dealing with any issues that were specific to PD in IT. Above all, however, the categories used in such a strategy would be neither derived from, nor indicative of, how the participants themselves evaluated the programmes. If we were to answer the question about what participants themselves say about professional development in IT or do when they undertake it, it was important to adopt a more inductive strategy for analysis, using more iterative, grounded theory and phenomenological processes as outlined by Strauss & Corbin (1990), Powney and Watts (1987), and Hycner (1985).

The question still remained open, therefore, of exactly what to look for in the programmes that might stand as good evidence when observing the professional development programmes in action. One possibility was to take an ethnographic stance and collect data on everything possible and slowly identify trends and characteristics that might be worthy of further investigation, and to some extent this was the stance adopted in the first observation and the first interview in the pilot. But, as McNamara (1980) points out, even the most open-minded observer must, at some point, impose filters on the data to make some different, coherent sense of it that is not the phenomenon itself but retains a recognisable sense of what it was; some line drawn cartoon, as it were, though hopefully not a caricature, which may be a poor but recognisable substitute for the original 'photograph' under investigation. After the first trial interview and observation sessions, therefore, it became clear that subsequent data would need to be collected much more purposefully and selectively. As indicated already, the main focus of the study was to collect data that might be useful for a process evaluation of the programmes. In this sense the methodological question became: if evaluation is description with value added, what, exactly, needs to be described? The answer, derived from the democratic, process models of evaluation developed by Stake, MacDonald and others, was: that which the various participants and stakeholders themselves 'evaluated'; the aggregation of characteristics or features of the various programmes that the participants and stakeholders identified in their actions and conversations as being of significance to them, their 'evaluated characteristics'. Rather than try to second guess the participants' by imposing categories of interest based on what we thought might be important in such programmes, we should observe and describe what they thought was important.

Thus, in a reversal of what we had observed to be normal practice for the so often reported internal evaluations of these programmes, it was resolved to derive the characteristics of PD programmes to be particularly observed in the evaluation proper, from numerous prior interviews with participants and stakeholders in the pilot. Only later, as part of the synthesis of results, would it be relevant to see if the characteristics so developed were or were not a fit to existing theoretical models.

Horse and Cart No. 3: 'Characteristics' and 'Criteria' in Participants' Evaluative Comments

It was in the analysis of this interview data that we faced the third major methodological dilemma, which revolved around the necessity of distinguishing between 'characteristics' and 'criteria' as components of the concept of evaluation. A characteristic of something is one of its discernible features; a criterion is a standard against which an individual judges a characteristic or group of characteristics of something to be good or bad, acceptable or unacceptable. For example, when someone sets out to buy a house and says to the land agent: "I want a north Melbourne property with a large garden, at least three bedrooms, and made of brick", the characteristics to be evaluated may be conceived as location, garden size, number of bedrooms, and type of building material. The criteria, or standards, being applied are north Melbourne, large, three, and brick. A consensus among home buyers is much more likely to exist with regard to evaluated characteristics than with regard to criteria: they will tend look at the same or similar aspects of a property, even though they may as individuals make very different judgments about it. That, presumably, is why land agents tend to describe in advertisements the same set of features for every property they put on the market. Similarly, it was felt to be more useful in terms of research design, data collection and analysis in this study, to isolate first the evaluated characteristics, meaning those observable or interpreted features of the professional development programmes about which participants and stakeholders made judgements, and only thereafter to speak of their criteria, meaning those specific, often idiosyncratic, standards by which particular participants and stakeholders finally judge a programme or its components to have been successful or unsuccessful. The index to the study's description would thus be the participants' evaluated characteristics; the index to any synthesis of results might thus be the contrasts and comparisons, insofar as they could be reasonably inferred, among different participants' evaluative criteria.

The procedure adopted for the initial analysis of interview transcripts was first, to determine what might constitute a statement or behaviour, what Hycner (1985) calls a "sense unit", which was, or could be reasonably interpreted as being, evaluative in its nature or intent. Such units of meaning might then be collated to develop categories of the particular characteristics of PD events that participants were evaluating. A first phase in this consisted of listening to the tapes while reading the transcript and imposing an initial filter between not relevant and possibly relevant statements, highlighting anything that identified the speaker as judging or evaluating the courses or models under discussion, and the aspects or characteristics of the PD event that was being judged. From this evolved a concept of a basic unit of evaluative meaning which for convenience we call an 'evaluative comment'. That is, phrases, sentences, or paragraphs, the component parts of which could be isolated according to their role within each statement and contribute to an evolving tree of coding categories.

Some examples of evaluative comments therefore, are: (my emphasis of key phrases)

The, ah, workshop was something that I particularly enjoyed. We worked together as a group, doing something new, which is the Far Site program. (Teacher in the qualifications course)

Um, just having the, um, the issues of IT rein- reinforced.(Teacher in the qualifications course)

No...no, the strength of, one of the great strengths of what were doing is that its being driven by the staff and by the professional development objectives that the senior management team have set at the beginning of each year.(Resource Teacher)

R- What would be your ideal? I mean if you had free reign and people came when you actually wanted them to come...

1- Probably weekly would make, would make me, would make you do it. I mean I guess, the idea would be just to have it in your room too, wouldn't it. (Teacher in the Advisory model)

I needed to know about the internet and email. (Teacher in the Advisory model)

I've just thought of some other things that I got out of um things like the Dip. Ed.Man., apart from the relevance and usefulness of things was the camaraderie with a wide range of professionals, not just from secondary service which had been my background, and today Ive got very strong professional networks that go from early childhood right through to tertiary, and, um, that's something that I value immensely. (Manager of the qualifications course)

Within any given evaluative comment there were four possible elements that might be useful in interpretation:

1. the judgement that the respondent was making, favourable or unfavourable?

2. the feature of PD that the respondent was judging,

3. the evidence or examples that were being articulated to justify the judgement, and

4. some indicator(s) of the relative importance or priority being assigned to such evidence or examples.

A more detailed explanation of the component parts of an evaluative comment and how they were interpreted in the analysis can be found in Box 1. Clearly, few of the interviewees' statements conveniently reproduced the syntax described by this model, but it provided a useful conceptual framework, a structure for reading and interpreting the transcripts and tapes.

 

 

 

 

The analysis of interview data along these lines eventually gave rise to a tree of five key categories and twelve sub-categories of evaluative comments, from which relevant observable characteristics of PD events and participants evaluative criteria might be derived. The key categories were:

1. comments about formal organisation.

2. comments about content.

3. comments about interpersonal dynamics or interactions.

4. comments about personal motivation and consequences.

5. comments on the uniqueness or differentiability of PD in IT compared to other forms of PD.

Categories 1, 2, 3, and 5 could be further grouped as comments made about the programmes, as opposed to those in category 4 which were comments made about themselves as participants or stakeholders.

Comments on formal organisation included features such as the location of the professional development activity, the time available and timetabling of the events, its administrative efficiency, and so on. In terms of interpreting criteria from these characteristics the unifying idea seemed to relate to a general notion of access and availability. Timing, location, and the like could be seen as being important less in themselves than as factors affecting the availability of, and participants access to, people (such as fellow participants, technicians, and collegial experts), information (such as help sheets, timetables, room bookings, and email addresses), or equipment (such as computers, phone sockets, the right version of software, and so on). The easier such access, the more favourably the programme was perceived.

Content characteristics fell broadly into three sub-groups: technical knowledge and skills, the development or learning of practical teaching strategies and resources, and general pedagogical theory and conceptualisation. The criteria applied here varied considerably from individual to individual, but could be seen as relating to intrinsic interest.

Interactional issues included comments on locus of control issues such as empowerment or authority, or opportunities for input and initiative; on aspects of social intercourse such as sharing ideas with other teachers, the amount of individual attention given, the types of tasks and activities undertaken, or interpersonal relationships with other participants. The evaluative criteria here were often expressed in terms of how such factors affected how they felt about the programme and their participation in it.

Comments respondents made about themselves were grouped in two categories. First, there were comments on motivation, that is, comments related to pre-existing goals or other stimuli such as career advancement, reputation, a desire to be up to date, curiosity, a sense of ignorance, a sense of obligation, a commitment to innovation, or self-improvement. And secondly, consequences related to post-hoc action, comments made about how or whether the programme had or had not affected their levels of understanding, their subsequent professional practices, or provided opportunities for professional reflection, time out from normal routines, and so on.

The four main characteristics of the programmes which the respondents evaluated, therefore, (formal organisation, content, interpersonal dynamics, and personal goals or gains), and the several sub-categories within each of those headings, formed the coding categories for the interview analysis, and an index to what data would be collected from observations and documents.

The fifth category of comments is not a criterion or an 'evaluated characteristic' as such, but it was felt important to isolate this group of comments in order to get some indication of whether, and in what respects, the participants evaluations of their PD experiences in IT were indicative of, and therefore generalisable to, their views on PD programmes as a whole. 'Is PD in IT special or unique in any way?' was a question added to the final interview schedule in order to elicit some data on this issue.

The method of collating and comparing the various data sets was as follows. Under each of the headings a particular respondents evaluative comments' were listed. Following standard grounded theory procedure (Strauss & Corbin, 1991, Hycner 1985) repetitions of the same point were not counted, only new aspects of whatever the main characteristic was. Repetitions of the same point were merely coded as an emphasis to indicate that it had been repeated elsewhere in the transcript and may thus indicate some sort of priority or importance. As Miles and Huberman (1994) point out, individual comments by respondents are often candidates for several coding categories, and where possible it is desirable to select a best coding category rather than to apply multiple coding. In such cases the best category was selected according to a general reading of the context of each evaluative comment, where these had been nested or prioritised in some way, with particular regard to each individual speakers emphasis and what was interpreted to be for them the primary, or most important characteristics. Interestingly, when this was done the comparative importance of formal organisational issues in all cases tended to recede in favour of a focus on issues of programme content, interactions and/or personal motivations. This tended to confirm Rhodes' (1990) view that form, though often mentioned by participants, is actually a subsidiary issue in teachers' evaluations of PD programmes in IT.

Distribution, moreover, was more important in such a synthesis than frequency. The number of times an individual respondent mentioned a characteristic of a particular model, may just indicate that particular speaker said more, talked for longer, or had a repetitive style of conversation. What was felt to be more important was the relative distribution of these characteristics as identified by the various respondents according to the respective roles they played (facilitator, teacher, stakeholder) or the model they were reviewing.

Thus, a sort of profile of concerns was built up on a sheet of paper for each respondent, or group of respondents, and a similar profile of action collated under the same category headings developed for the observations, though in this case additional information such as the length of time spent on a given piece of content or type of interaction was also included. A similar profile could easily be created for the documents, though this was not felt to be necessary for the pilot. The comparison of these twelve interviewee and three model profiles, taken in conjunction with a final re-reading of the transcripts and the observation notes to refocus on the whole picture rather than the specific bits of data, could then form the basis for synthesis of findings and numerous cross-case and cross-role comparisons.

Conclusion

It has been argued that the acid test of in-service, the thing which legitimates it as action, is whether or not it leads to desired change in teacher behaviour. If it does not, then it has not achieved its aims, it has not been 'effective', it was the wrong kind of in-service. The work, as it were, shall be known by its results. In one sense this is uncontentious. Clearly an examination of post-hoc perceptions, understandings and action is an essential part of any evaluation of the in-service programmes' effectiveness. But such an exclusive focus on post-hoc action is also rather simplistic, and more importantly, not helpful in terms of seeking remedy. There is little point in coming back later to explain why an in-service intervention did or did not work, if we have little or no data on what it comprised or looked like as a set of social interactions. Soundness in judgement is predicated on an intimate knowledge of the thing being judged. It is the extent of this intimacy that differentiates fact from opinion, conviction from conjecture, research from reflection.

It has been argued in this paper that in methodological terms, evaluation can be conceived as description with value added, as being the systematic investigation of both the procedural and the consequential worth of some educational practice or system. In such evolving models, moreover, the rich description of a social event, rather than just the measurement of pre-event goals and post event effects, is highlighted as a methodologically appropriate and necessary strategy for the evaluation of educational programmes (Simons, 1987).

The literature is growing but still patchy on apparently successful forms and content of professional development in IT, but very little addresses it in terms of its interactional dynamic. Yet it is this latter which one would have thought was one of the keys to the success of any such programme. Much general classroom based research is beginning to address the question of what teachers and pupils do and say when they teach and learn, and how this is congruent or not with the manifold purposes of the enterprise. But few parallel investigations, at least in relation to teacher development in IT, seem to be happening with regard to what teachers and teacher educators do and say when they, respectively, learn and teach, in professional development events.

The pilot on which this paper is based set out to test and establish an appropriate sampling and mode of analysis for such a study, placing it in the dual contexts of contending public and professional imperatives regarding the role and significance of IT in schools, and contending concepts of appropriate evaluative epistemologies. It may be that there is no ideal model for professional development in IT, that neither sponsors expectations nor participants needs can ever adequately be met by any single 'mode' or model of delivery. Perhaps in professional development, as in haute couture: one size seldom 'fits' all, and rarely 'suits' any. But if there is an assumption being carried over from this pilot to the main study, it remains that professional development is a manifold and complex intervention in a teachers professional life that is not conducive to a simplistic outcome-based analysis, and that any final assessment of its effectiveness should at least begin with a richer description than is currently prevalent in the literature of what participants say about it and do when they take part in it.

Contact:

Dr. V. R. Ham

Christchurch College of Education

Dovedale Avenue

Christchurch

New Zealand

Ph: (64) 3 3482059

Fax: (64) 3 3437731

email: vince.ham@weka.cce.ac.nz

References

Bolam R. (1997) ‘The Continuing Professional Development of Teachers’. Unpublished Draft. Swansea, GTC England and Wales Trust.

Henderson E. (1978) The Evaluation of In-Service Teacher Training, London, Croom Helm.

Hycner R. (1985) ‘Some guidelines for the phenomenological analysis of interview data.’ Human Studies, 8, 279-303.

Joyce B. & Showers B. (1988) Student Achievement Through Staff Development. London, Longman.

Kemmis S. (1989) ‘Seven Principles for Programme Evaluation in Curriculum Development and Innovation’, in House, E. R. New Directions in Educational Evaluation. London & Philadelphia, Falmer.

Maclure M. (1989) ‘Anyone for INSET? Needs Identification and Personal/Professional Development.’ in McBride, R. (Ed.) The In-service Training of Teachers. London, Falmer.

McBride R. (Ed.) (1989) The In-service Training of Teachers. London, Falmer.

McNamara D. (1980) ‘The outsider’s arrogance: the failure of participant observers to understand classroom events.’ British Educational Research Journal, 6, 2, 113-125.

Miles M. & Huberman A. (1994) Qualitative Data Analysis. 2nd Edition. London, Sage.

Nevo D. (1989) ‘The Conceptualisation of Educational Evaluation: An Analytical Review of the Literature’, in House, E. R. New Directions in Educational Evaluation. London & Philadelphia, Falmer.

Newton M. (1993) ‘Styles and Strategies of Evaluating INSET’, in Burgess, R. (Ed.) Implementing In-service Education and Training. London, Falmer.

Powney J. & Watts M. (1987) Interviewing in educational research. London, Routledge & Kegan Paul.

Rhodes V. & Cox M. (1990) Current Practice and Policies for Using Computers in Primary Schools: implications for training. Lancaster:ESRC Occasional Paper InTER/15/90, September.

Sallis P. (1990) Report of the Consultative Committee on Information Technology in the School Curriculum. Wellington, New Zealand Ministry of Education.

Simons H. (1987) Getting to Know Schools in a Democracy. The Politics and Process of Evaluation, London. Falmer.

Stake R. (1967) ‘The countenance of educational evaluation.’ Teachers College Record, 68, 523-40.

Strauss A & Corbin J. (1990) Basics of Qualitative Research. Grounded Theory Procedures and Techniques. Newbury Park, London, New Delhi, Sage.

Weiss C. (1989) ‘The Stakeholder Approach to Evaluation: Origins and Promise, in House, E. R. New Directions in Educational Evaluation. London & Philadelphia, Falmer.

Weiss C. (1989a) ‘Towards the Future of Stakeholder Approaches in Evaluation’, in House, E. R. New Directions in Educational Evaluation. London & Philadelphia, Falmer.

Yin R. (1994) Case Study Research. Design and Methods. Second Edition, Thousand Oaks, London, New Delhi, Sage.