Tuesday, October 7, 2014


So, obviously I have not gotten around to any of the things I thought I would have done by now.

The spoken learner corpus is still very far off and I haven't yet added the written data from last semester.

However, now that the KOTESOL conference is behind me and I'm settling into the routines of having a second child (yay for me!), I hope to have the new, larger version of the GLC up in a few days...maybe by the weekend.

Saturday, March 15, 2014

Status of Spoken Learner Corpus

Well, it's halfway into March, and I still haven't completed transcribing and sorting the first version of the spoken corpus yet.

Sorry to any out there who might be waiting for it.

To be honest, I'm not sure when it will be done. The two weeks during the winter break I was going to use to complete the project were lost to unforeseeable troubles completely unrelated to the project.

Now, we are full-swing into the spring semester and I'm swamped with work. I have six classes this semester with a total of around 160 students. That's a lot of marking and classroom management to be done. On top of that, I am conducting a very large DDL/Student Concordancing experiment with two other professors this semester which takes up a lot of time (but will be worth it...it's the largest experiment of this kind ever done!). On top of that top, I've got to edit my conference proceedings submission from last year's KOTESOL conference and get my proposal done for next year's (it's gonna be on, you guessed it: DDL). Finally, I'm taking an EdX course on statistics which is hopefully gonna reinforce that I actually understand what all those numbers coming out of SPSS mean.

In any case, I don't expect to post the spoken corpus until sometime this summer.

But, there will be a new version of the written corpus out just after this semester ends. We might finally get to 2 million words. We already have the largest corpus of English produced by Korean learners (by far). My goal is to keep pushing and hopefully inspire others to keep producing larger corpora...and give them away for free.

These things really shouldn't be sold. Not when it costs so little to make them and so few of us love them enough to use them in our research.

Sunday, January 5, 2014

GLC 2.1 Text Files from Jae-Woong Choe

I've uploaded a nice gift from Jae-Woong Choe, a professor in the Department of Linguistics at Korea University.

He has created individual text files for each of the texts in the current corpus along with more accurate token counts for the overall corpus.

You can download these files here.

Many thanks to him for creating these files!