Archive for June, 2010

Guidelines for proofreading

Monday, June 28th, 2010

If you’re proofreading the transcriptions of BrailleSC interviews, please follow these guidelines:

  • Check that the transcription matches what’s being said.
  • Check spelling and punctuation.
  • Don’t worry about grammar if the transcription matches the audio.
  • There only needs to be one space between the end of one sentence and the beginning of the next.
  • Ignore any verbal filler like “Um” or “Uh” or “like” or “you know” or “I mean.” Such filler does not need to be included in the transcription. If it’s already there, you can delete it, or you can ignore it (in the interests of time) and count on me to catch it in the final phase of proofreading.
  • Where someone laughs, please include “[Laughter]” in that part of the transcription. This will be helpful for deaf users who read the transcription or watch a captioned version of the video.
  • Note that the word “braille” does not need to be capitalized unless it’s used as part of a proper name, like “Braille National Challenge” or “Louis Braille.”
  • Any questions? Just leave them in the comments below. Thank you!

An Experiment In Audio Transcription

Friday, June 18th, 2010

The BrailleSC project currently has almost 30 oral histories on video, some of them already transcribed but most of them not. For transcription work, we’ve been using Amazon’s Mechanical Turk service, which is certainly reliable, affordable and efficient (as I wrote here), but it’s not exactly ethical (as I wrote here). Is there another solution? What about crowdsourcing? Could we, in other words, make the task of transcription available to an army of potential volunteers from across the Internet and get good results?

Plenty of online projects rely on crowdsourced work with (mostly) good results: see, for instance, LibriVox, Project Gutenberg, and Wikipedia. And recently, George Mason’s Center for History and New Media was awarded an NEH Digital Humanities Start-Up Grant “to support the design and development of a tool for crowdsourcing documentary transcription:”

The $49,215 award will enable CHNM’s dev team to to build an open source tool to enable researchers to contribute document transcriptions and research notes to digital archival projects, thus harnessing the power of the community of users to improve the discoverability and usefulness of the archive.

That’s a very exciting project, and I look forward to seeing the results. What about crowdsourcing audio transcription?

Consider this an invitation to participate in an informal experiment in volunteer transcription work. Here’s the question we’d like to answer:

Can a project involving audio or video recordings of spoken words rely on volunteers for transcription of interviews broken up across short clips?

Transcriptions will allow for various forms of textual analysis and re-use of the interviews, and transcriptions will also aid in creating captions to accompany the videos, which will make them accessible to users with hearing impairment.

What I’ve done is take one interview and break it up into 2-minute clips. I chose that length somewhat (but not completely) arbitrarily. Each clip is hosted on YouTube, where you may view it while transcribing. And while it’s true that YouTube has added automatic captioning of their videos, these captions don’t always work and when they do their accuracy leaves something to be desired.

Here are the details:

  1. Unless you instruct us otherwise, we will credit you by name on the web site for your work.
  2. All of the materials we produce at BrailleSC will be published with a Creative Commons license allowing others to make use of them under certain conditions (Attribution-Noncommercial-Share Alike), so your work could potentially benefit many projects (if any other projects take our materials and work with them, that is).
  3. To volunteer, go to this page and follow the directions.

Any questions or comments about this process (or about the challenge of transcribing audio)? Please leave them below.

Thanks!

[Creative Commons-licensed flickr photo by Beverly & Pack]

Crowdsourcing Audio Transcription

Friday, June 18th, 2010

Maurer (screengrab)

We’re trying to figure out if it will be possible to use volunteers from across the Internet to transcribe spoken words recorded on digital audio or video. To participate in this informal experiment, please follow the instructions below.

(For a more detailed explanation of what this is all about, please read this.)

Thank you!

Instructions

  1. Before you do anything else, first leave a comment below stating which part of which video you are going to transcribe. This will prevent two people accidentally transcribing the same video clip.
  2. On your computer, open a simple word processor like Notepad (if you’re a Windows user) or TextEdit (if you’re a Mac user). Use this application for creating your transcription (and save often!).
  3. While watching and listening to your chosen video on YouTube, transcribe what’s being said.
  4. Ignore any instance of “um” or “uh.” Where there’s laughter, simply type “[Laughter].” If you can’t make out what’s being said, simply type “???” in that portion of your transcript.
  5. When you are finished with your transcript, please cut and paste it into a comment below, identifying which video clip you’ve transcribed.
  6. Finally, please go here to leave any observations you’d like to share about your experience. How long did it take you to transcribe 2 minutes of audio, for example? Do you have any ideas for how to improve this process?

Links to video clips

Video 01

Part 01 :: Part 02 :: Part 03 :: Part 04
Part 05 :: Part 06 :: Part 07 :: Part 08
Part 09 :: Part 10 :: Part 11 :: Part 12
Part 13 :: Part 14 :: Part 15 :: Part 16

Video 02

Part 01 :: Part 02 :: Part 03 :: Part 04
Part 05 :: Part 06 :: Part 07 :: Part 08
Part 09 :: Part 10 :: Part 11 :: Part 12
Part 13 :: Part 14 :: Part 15 :: Part 16
Part 17 :: Part 18

Questions or comments about this process?

If you have any questions or comments about this project or this process, please leave them in the comments section of this post: “An Experiment in Audio Transcription.”

Thanks!

[Creative Commons-licensed flickr photo by ghwpix]