An Experiment In Audio Transcription

June 18th, 2010

The BrailleSC project currently has almost 30 oral histories on video, some of them already transcribed but most of them not. For transcription work, we’ve been using Amazon’s Mechanical Turk service, which is certainly reliable, affordable and efficient (as I wrote here), but it’s not exactly ethical (as I wrote here). Is there another solution? What about crowdsourcing? Could we, in other words, make the task of transcription available to an army of potential volunteers from across the Internet and get good results?

Plenty of online projects rely on crowdsourced work with (mostly) good results: see, for instance, LibriVox, Project Gutenberg, and Wikipedia. And recently, George Mason’s Center for History and New Media was awarded an NEH Digital Humanities Start-Up Grant “to support the design and development of a tool for crowdsourcing documentary transcription:”

The $49,215 award will enable CHNM’s dev team to to build an open source tool to enable researchers to contribute document transcriptions and research notes to digital archival projects, thus harnessing the power of the community of users to improve the discoverability and usefulness of the archive.

That’s a very exciting project, and I look forward to seeing the results. What about crowdsourcing audio transcription?

Consider this an invitation to participate in an informal experiment in volunteer transcription work. Here’s the question we’d like to answer:

Can a project involving audio or video recordings of spoken words rely on volunteers for transcription of interviews broken up across short clips?

Transcriptions will allow for various forms of textual analysis and re-use of the interviews, and transcriptions will also aid in creating captions to accompany the videos, which will make them accessible to users with hearing impairment.

What I’ve done is take one interview and break it up into 2-minute clips. I chose that length somewhat (but not completely) arbitrarily. Each clip is hosted on YouTube, where you may view it while transcribing. And while it’s true that YouTube has added automatic captioning of their videos, these captions don’t always work and when they do their accuracy leaves something to be desired.

Here are the details:

  1. Unless you instruct us otherwise, we will credit you by name on the web site for your work.
  2. All of the materials we produce at BrailleSC will be published with a Creative Commons license allowing others to make use of them under certain conditions (Attribution-Noncommercial-Share Alike), so your work could potentially benefit many projects (if any other projects take our materials and work with them, that is).
  3. To volunteer, go to this page and follow the directions.

Any questions or comments about this process (or about the challenge of transcribing audio)? Please leave them below.

Thanks!

[Creative Commons-licensed flickr photo by Beverly & Pack]

13 Responses to “An Experiment In Audio Transcription”

  1. vika Says:

    It took me about 12 minutes to transcribe Part 01, but I had some interruptions — maybe 10m of actual transcription time?

    I think I’d do this on a volunteer basis more than once; it’s a good (not too long) work break. On the other hand, if there are hundreds of these snippets in a project, it might feel like a drop in the bucket. You might consider defining a *significant* contribution. Something like, “every bit of transcription helps, but if you would like to get Really Involved, we’d be grateful if you transcribed a total of five video segments.” Maybe that’s just my quantitative brain speaking. But paradoxically, if I know what the project things is significant, and I get there, then I’m more likely to come back and do more.

  2. George H. Williams Says:

    That’s a good suggestion, Vika. Maybe there could be different levels of recognition in increments of minutes transcribed. I know that non-profits will do this to recognize their donors, creating such categories as “supporters,” “sponsors,” and “patrons” corresponding to different levels of support. Something similar might work here.

  3. Tom Says:

    If it was more like a web application, it’d be nice to be able to select components in minute increments. Then you could also build in the game-like progression/reputation elements you’re talking about while also giving me some confidence I’m not re-transcribing something that’s already been done.

    There’s probably a lot that the folks who do transcription of written records like LDS could talk to you about in terms of creating community etc. around this type of work. The software might also have some application. I saw some of it and how it worked when I did some work with Prof. Walden (Univ. of Richmond) around the Freedman’s Bureau Records.

  4. George H. Williams Says:

    Tom, which software are you referring to in your second paragraph? Do you have a link for more information?

    I like the ideas you share—in your first paragraph—about creating a system by which each volunteers’ contributions are tracked and added to their “reputation.” Sort of like earning points in a game, as you suggest.

    As for “confidence [that you're] not re-transcribing something that’s already been done,” note that the first item in the instructions is to specify which video you’re going to transcribe so that someone else doesn’t replicate your work (or vice versa) while you’re doing it. Maybe that step needs to be rephrased for clarity?

  5. Tom Says:

    Yeah, I hit refresh a few times before leaving to make sure no comments had come in claiming the transcript (and then didn’t leave my own comment). Let’s pretend I was testing how idiot proof it was.

    This is the software/site that I believe I saw at that time- although it seems more polished at this point. It’s been two years or so.

  6. Carla Says:

    Well, this brought back memories of my work/study job, and how!

    Mark Mauer is quite easy to transcribe, whereas Pat is quite difficult. Transcribing part 04 was much more labor intensive than Part 05 and required having to repeatedly stop, transcribe, replay and double-check Pat’s speech. She backtracks quite a bit, interrupts herself, reorganizes her thoughts mid-sentence, and speaks quite quickly. She also doesn’t speak as clearly as Mark. Because of the difficulty of transcribing her speech, I’d recommend trimming the videos of her speaking down to something less than 2 minutes (1 minute?).

    As I mentioned to you in a DM, it is quite difficult for me to skip the “um” aspects of speech, especially because Pat uses them so often. I relied on my previous work/study experience w/r/t run-on sentences (a lot of people speak in them, Pat in particular): I transcribed them as such. Are you looking for true speech patterns? I assumed that if you were looking for cleaned-up versions of the transcription, you would do that yourself, post initial transcription.

    Vika makes an excellent point about asking volunteers for levels of help, too. People will attempt to meet your requirements if they know what goal they are trying to help you reach. And I did the same thing that Tom did with refreshing a few times to see what was being worked on prior to beginning transcription. That’s why I started on part 05 first, it looked like that area was as yet untouched.

    Other than that, it was a breeze. :)

  7. George H. Williams Says:

    Carla, it seemed clear to me that Dr. Maurer is very experienced with public speaking (even extemporaneous public speaking) while Mrs. Maurer’s speech is perhaps more typical of what an individual’s oral history will sound like. Also, the lavalier microphone was attached to Dr. Maurer’s lapel, which results in his voice being more clearly recorded than Mrs. Maurer’s. I did a little tweaking of the audio to correct for this, but I believe the difference is still pretty noticeable.

  8. joanna Says:

    It took me an hour and fifteen minutes to type up section #3, which was largely a transcript of Mrs. Maurer discussing how she uses Braille.

    Twenty minutes were spent tinkering with machines and figuring out what worked best for me. Once I settled into using my Ipod for listening (after saving the video)and the laptop for typing, things went pretty smoothly.

    My typing reminded me of what using a MOO is like–I’d type as much as I could retain in memory, skip some spaces and pick up with the next idea. I had that same feeling of having to ignore what else was going on on the screen in order to finish typing my thought (or, Mrs. Maurer’s thought, in this case.) So, I typed in strips, so to speak, going back and relistening and filling in more and more of what had been said.

    Would I be happy with three dollars an hour for doing this? No. Three dollars wouldn’t cover the time, the electricity and the cost of using two machines. Would I do it again for an Open Source project which was explained from the start? Absolutely.

  9. joanna Says:

    Regarding significant contributions–Before I came here, I spent some time at the Pancreatic Cancer site, making a donation for a cousin’s planned run in October. Donations on her PanCan page are represented by a thermometer, with her total goal listed on top. Each 50 -dollar increment is represented by the thermometer’s mercury rising a bit more. Perhaps some kind of graphic that would represent what we’ve pledged to do and how far we’ve gotten (divided into two-minute increments)would inform and motivate each other and anyone thinking of joining.

  10. George H. Williams Says:

    Joanna, did you find that you had to just listen to the words—rather than watch the video—in order to concentrate on the task transcription? I’m curious as to why you saved the video to your iPod instead of just playing it on (and listening with) your laptop.

    And I like your suggestion (similar to Tom’s, above) of some kind of visual representation of progress.

    Thanks!

  11. joanna Says:

    What motivated my technique was the flipping back and forth from video to word-processing screen. I worked on my laptop without another screen/monitor, so I wanted some stability,and yes,after the first round of writing, I focussed on listening to the words and reading what I had typed so that I could insert text in the spots as well as check my accuracy. Watching the video would have been distracting. I have no training in transcribing things, so I did what seemed most efficient.

  12. Kaitlin Says:

    It took me about 15 minutes to do part 01 of video 02, primarily because I had to stop and start over (I got a phone call in the middle of it). I started by using Notepad, but I switched to using a Side Note from OneNote because I could have both the note and the YouTube screen up at the same time.

    I should note that I used my Skype headphones (so, sound in only one ear) since my laptop’s speakers are not very good.

    I had a little trouble with the accent – I’m a New Englander and her accent is very southern to me, so I think that could be why I missed some words. I don’t know if this would affect the results, but it might be worth noting the nationality or the accent of the video’s speaker before offering it up for transcription. I helped a colleague audio-record the Q&A sessions of a conference, and when looking up transcription services, I noticed that some services charge extra for “hard” recordings, such as those which contain unfamiliar accents.

  13. Kaitlin Says:

    Oops, I meant to say that it took me about 20 minutes, but 15 when I had time from start to finish.

    Also: It would be cool if there was some way to “sign out” the videos – I remember there being something like this for files when using Dreamweaver, so if you were collaborating on a website you wouldn’t have two people editing the same file.