Chapter 5 Plagiarism and academic integrity

(Plagiarism, proper citing, steering clear of academic misconduct, rules for collaboration)

5.1 Overview

5.1.1 Abstract:

In this course we operate a Full Disclosure Policy for attribution. This means everything that is not your own, original idea must be identified and properly attributed.6

5.1.2 Objectives:

This unit will:

  • point out the rationale for proper attribution;
  • define the “Full Disclosure Policy” for attribution in my courses;
  • introduce the resources available to you to avoid plagiarism and other academic misconduct.

5.1.3 Outcomes:

After working through this unit you:

  • are aware how academic offences can be avoided with a full-disclosure policy;
  • are familiar with proper citation formats for journal and Web contents;
  • ensure that all your activities in this course and elsewhere are in accordance with the letter of the published policies, and the spirit of scientific integrity.

5.1.4 Deliverables:

Time management: Before you begin, estimate how long it will take you to complete this unit. Then, record in your course journal: the number of hours you estimated, the number of hours you worked on the unit, and the amount of time that passed between start and completion of this unit.

Journal: Document your progress in your Course Journal. Some tasks may ask you to include specific items in your journal. Don’t overlook these.

Insights: If you find something particularly noteworthy about this unit, make a note in your insights! page.

5.2 Plagarism

Make this unit the first entry of your Journal!

This is the age of open source, boundless mashup, awesome reposts and instant repurposing of information. It might seem that our rules of referencing and citation are just another academic anachronism. After all, we all copy from stack overflow, right?

No - actually: wrong for two reasons. One: information is not any longer a commodity that increases in value if its supply is artificially constrained. Rather the value of information in academia - our common currency - is now “mindshare”, and mindshare cannot grow without attribution. And two: part of any course at UofT is its “summative assessment”: we mark submissions to evaluate the aptitude and achievements of students. That can only be done if original thinking by the student is clearly identified, and distinguished from merely repeating other’s thoughts.

But let’s face it: UofT has a plagiarism problem. This has gotten worse over the past years - and it seems worse in the CS realms than in the domains of the life sciences. Yet, even if all your peers think no one cares about missing attributions, that doesn’t make it somehow right: ethics is not a result of opinion polls. No matter how socially acceptable plagiarism has become, no matter how many others do “it”, no matter how many likes or upvotes or retweets a “No Big Deal!” post attracts, unethical behaviour is wrong. It goes against everything we stand for as scientists. And it is corrosive, not just for your community, but first and foremost for yourself.

In this course we operate a Full Disclosure Policy. That doesn’t mean you can’t get good data and examples from wherever you find them - on the contrary I absolutely expect you to hunt everywhere for information and examples: biostars, stack, RBloggers, even reddit (sometimes). There is great value in finding how others have addressed a problem, or divide up and organize a particular topic, and compiling the knowledge of the entire community is a great starting point for excellent work. But (a) this process has to be transparent, and, indeed you need to develop and document a sense of pride in mastering this art and attributing the contributions of your sources, and (b) compiling information does not substitute for your understanding of the material that you are presenting.

Regardless whether you are reusing, quoting, paraphrasing, summarizing or following someone else’s structure or advice: reference it. You can never reference too much, but if you don’t reference enough you are probably committing an academic offence and I am obliged by University Policy to take the issue to the Office of Student Academic Integrity (OSAI). Regardless whether you are writing an assignment, updating your journal, uploading code, replying to posts on the mailing list - for anything that is submitted for credit, directly or indirectly: make sure all your references are complete. The principle is quite simple:

** Full disclosure policy for this course:**

  • If it’s not your own, new idea, it has a source.
  • All sources must be referenced.

5.3 How to reference

You probably have seen resources that refer to other’s observations or opinions, and teach you to reference in manuscripts and essays in the life sciences and the humanities. These are generally less relevant for the kind of work that we do, and perhaps this is one of the reasons for poor uptake. Indeed, most of the writing in our courses is informal, and it may not be obvious how to properly embed citations in the flow of the narrative. The solution is to thouroughly contextualize your attributions with statements such as:

  • “inspired by…”,
  • “based on…”,
  • “according to …”,
  • “following …”,
  • “see also …”,
  • etc. as appropriate.

Some specific points to consider:

  1. On Wiki pages, use the text tag to organize your citations.
  2. In R code, put your citations directly into the code in a comment.
  3. The URL of an image on the Internet is not by itself an adequate reference. Author and context must be shown.
  4. All links in references must be to the original source, not to e.g. a blog post about the source.
  5. Code that is copied from e.g. stack overflow or similar must be referenced with a link to the post AND the name of the author.
  6. Use the APA citation format for this course.

5.4 When and what to reference

Some material may be “common knowledge”. Obviously, you don’t need to cite the resource that taught you how to write a for-loop, for example. Where to draw the line? If in doubt, ask, discuss, seek consensus in the community and in class.

  • Example code from an R package vignette that you adapt and reuse must be referenced.
  • Code that you take from a different course must be referenced.
  • Code that you take from this course must be referenced.
  • Code that you have translated from a different language must be referenced.
  • Code that you have jointly developed with a classmate must acknowledge the contribution.
  • Code that you have copied from a classmate’s work on the Student Wiki must be referenced.
  • Code that is open source must be referenced.
  • Code that is in the public domain must be referenced. This one is particularly troublesome: some authors put their code into the public domain, and state that you are free to use it without any copyright restrictions and without need for acknowledgement. But this can give rise to a misunderstanding: it only refers to the legal status as far as reuse is concerned, it says nothing about authorship. Obviously: just because someone graciously allowed you to use their idea, that does not mean that you are the author.

5.5 Reusing previous material

A second mistake that has brought students to the Dean’s office more than once is re-use of material from previous courses. This is a simple one: you can’t get academic credit for the same material twice. This means: if you have already submitted something for a different course elsewhere, or for a different assessment in the same course, it is no longer an original contribution. Of course you can cite your own work and then reuse it - if it’s useful, bravo - good for you. But you have to be upfront about it, and apply the Full Disclosure Policy in spirit. Again: if in doubt, ask for advice.

5.6 “Adjusting” results

It sometimes happens that a piece of code you are submitting won’t run. It just won’t. You can’t see the mistake, it’s three in the morning and you just can’t take it anymore. It’s just a small variation from the spec - and you can easily fix the output by hand.

So you edit a few lines in a printout, or a few cells in a spreadsheet, and submit that result. All good, right?

No. Not good at all. You have just falsified your code output. In terms of academic misconduct this is called “concoction”. And it’s pretty high on the list of things that will make for a very bad day.

Do not ever change code output by hand. If an assignment asks you to submit code and results, the exact code you submit must generate the exact output that is claimed for it. Obvioulsy, there will be assessment scripts that will verify that. And when the assessment script signals a discrepancy, that will set off a process …

Above, I’ve highlighted a few issues that I have come accross in past courses. Below, are extensive resources that will help you work better. Go and read them.

5.7 Task 4

5.8 A final note

If you have any questions about these policies, if you would like to start a discussion, or if you can share some anecdotes: post on the mailing list. We need to make these policies alive and meaningful, otherwise we’ll see mistake after mistake, tears, anguish and missed graduations. No more!


  1. This includes journal articles and books, obviously, but also blogs, discussions on StackOverflow, Github gists, and especially course material of this and other courses, your peer’s notes and journal entries. and material you have produced previously.