Thursday, December 19, 2013

The Gachon Learner Corpus Version 2.1

The newest version of the Gachon Learner Corpus is here!

This version includes all of the most recently collected texts.

Version 2.1 contains 1,609,517 tokens from 16,113 individual texts produced by 1607 participants.

I made sure the data entry with all the new texts and learner profiles is consistent and doesn't have any of the errors that had been pointed out in the past.

Check back for more updates soon.

Monday, December 9, 2013

How-to File and Data Cleaning

First, many thanks to the Korean Association of Corpus Linguistics for inviting me to speak about the Gachon Learner Corpus at Korea University in Seoul this last Saturday.

It was an excellent conference, and I was really excited to hear Laurence Anthony give a tutorial on AntConc. It was very nice to meet him.

So, today I finally had a chance to make a tutorial on how to use the Gachon Learner Corpus.

You can check it out here.

I also had a chance to go through ver2.0 of the corpus and fix various issues with incorrectly entered data in the learner profiles.

Finally, I was able to get a more accurate word count for the corpus (it's actually slightly larger than I had thought) and a better count on the number of participants.

Thanks for bearing with me. I'm doing all this in my spare time, so it's slow-going, but we're getting there.

Announcements:

1 - The newest version of the corpus should be available in one to two weeks. Once the semester has finished, I'll compile all the writing assignments for this semester and add them to the corpus. Check back for that.

2 - The Gachon Spoken Learner Corpus is coming soon! I hope to have finalized transcripts of the speaking exams uploaded by the end of February. The exams are five minute conversations between students in the same courses as those involved in the written corpus, using the same book, and responding to exactly the same questions as the writing assignments. I'll keep a better log of updates from here on out.