Saturday, March 15, 2014

Status of Spoken Learner Corpus

Well, it's halfway into March, and I still haven't completed transcribing and sorting the first version of the spoken corpus yet.

Sorry to any out there who might be waiting for it.

To be honest, I'm not sure when it will be done. The two weeks during the winter break I was going to use to complete the project were lost to unforeseeable troubles completely unrelated to the project.

Now, we are full-swing into the spring semester and I'm swamped with work. I have six classes this semester with a total of around 160 students. That's a lot of marking and classroom management to be done. On top of that, I am conducting a very large DDL/Student Concordancing experiment with two other professors this semester which takes up a lot of time (but will be worth it...it's the largest experiment of this kind ever done!). On top of that top, I've got to edit my conference proceedings submission from last year's KOTESOL conference and get my proposal done for next year's (it's gonna be on, you guessed it: DDL). Finally, I'm taking an EdX course on statistics which is hopefully gonna reinforce that I actually understand what all those numbers coming out of SPSS mean.

In any case, I don't expect to post the spoken corpus until sometime this summer.

But, there will be a new version of the written corpus out just after this semester ends. We might finally get to 2 million words. We already have the largest corpus of English produced by Korean learners (by far). My goal is to keep pushing and hopefully inspire others to keep producing larger corpora...and give them away for free.

These things really shouldn't be sold. Not when it costs so little to make them and so few of us love them enough to use them in our research.