TOEFL® Speaking: Deep Dives

Reliability and Comparability of TOEFL iBT® Scores

My Speaking Score (TOEFL Speaking Prep) Season 1 Episode 36

Send us a text

In this episode of the TOEFL Speaking Prep Podcast titled Reliability and Comparability of TOEFL iBT® Scores, we dive into the intricate world of standardized testing. Have you ever wondered how your TOEFL score reflects your true English proficiency, no matter where or when you take the test? Join us as we explore the key processes that ensure TOEFL scores are reliable and comparable across different test versions.

We break down the five pillars of reliability—from standardized administration to how AI and human raters work together to evaluate speaking and writing tasks. Plus, we’ll discuss the cutting-edge technology behind SpeechRater and the importance of equating to ensure fairness.

Whether you’re preparing for your exam or just curious about how it all works, this deep dive will reveal the extensive efforts that go into making TOEFL scores trustworthy and meaningful.

Source

Free TOEFL Speaking practice:

Free resources:

Social:


My Speaking Score serves 000s of users across the globe by helping them data-power their TOEFL Speaking prep.

Welcome back to the Deep Dive. We're diving into something a listener specifically requested about standardized testing. Oh, interesting.

(0:27 - 0:34)
Yeah. How do they actually make sure the scores are reliable? And we're looking specifically at the TOEFL IBT. Right.

(0:34 - 0:35)
Right. The TOEFL. Yeah.

(0:35 - 0:39)
That English proficiency test. People use it to get into universities all over. All over the world.

(0:40 - 0:50)
Yeah. So how do they know that your score is really reflecting your English, even if you take a different version of the test? Yeah. That's a good question.

(0:50 - 0:59)
It's something that a lot of people probably don't think about, but different test forms, different times you're taking it, different locations, making sure all of that is comparable, that's a big job. Yeah. Yeah.

(1:00 - 1:13)
Like what goes into that? But before we get into all that, what is the TOEFL IBT? So the TOEFL IBT is like the gold standard when it comes to these English proficiency exams. Okay. We're talking globally recognized.

(1:13 - 1:17)
Universities use them in over 150 countries. For admissions. Yeah.

(1:17 - 1:26)
So it's a pretty big deal if you want to study abroad or you need to prove that your English is good enough, you know, internationally. So it's high stakes. Yeah.

(1:26 - 1:31)
Absolutely. And so that makes this whole reliability thing even more important. Oh, for sure.

(1:31 - 1:37)
You would not want to like take the test twice and get two wildly different scores. No. That would be awful.

(1:37 - 1:53)
That wouldn't, yeah, that wouldn't be good. And that kind of gets to the heart of what we mean when we talk about reliability in this context. It means that you can be confident that whatever score you get, it really reflects your English, not just, you know, some weirdness in the test itself.

(1:53 - 2:01)
Right. So how do they do that? What's the secret? Yeah, it's not really a secret. It's more like a system, a very, very thorough system.

(2:01 - 2:12)
We dug into a research paper because you were curious about this, and they outlined five key areas. Okay. That ETS, that's the organization that makes the TOEFL that they focus on.

(2:12 - 2:18)
So they really emphasize standardized administration and security. That basically means no cheating. Okay.

(2:18 - 2:24)
And that everyone takes the test under the same conditions. They have very, very detailed test specifications. Okay.

(2:24 - 2:38)
So standardized, I get that, like, you know, no cheating, but like, what's a detailed test specification? So it sounds very official, but really... It does. It sounds scary. It's just like the blueprint for the test.

(2:39 - 2:58)
Okay. So it outlines everything, how many questions, what format they'll be in, what specific, you know, content they're going to cover. So they have this like roadmap to make sure that even if the questions are different from one TOEFL to the next, the difficulty and what it's actually testing is the same.

(2:59 - 3:05)
It's like the same recipe, but like slight variations. Yeah. You can get creative within the boundaries.

(3:05 - 3:19)
Okay. So how do they even come up with the test? Like, how do they create it? So this is really cool. They use something called evidence-centered design to make sure that every single question is actually measuring what it's supposed to be measuring, which is your English.

(3:20 - 3:26)
So it's not just like pulling random vocab words? No, no, no. It's way more. It's way more involved than that.

(3:26 - 3:39)
There's a whole process. They pilot test new questions and tasks. They collect tons of data, analyze it to weed out any questions that aren't really doing their job.

(3:39 - 3:41)
It's a whole thing. That's intense. Yeah.

(3:42 - 3:48)
Yeah. And they really want to make sure that nobody's cheating. I read that they use like biometric data and things like that.

(3:48 - 3:51)
Yes. Yes. What is that? That's to make sure everything is standardized.

(3:51 - 3:56)
Like we talked about. Like, are you who you say you are? That kind of thing. So they use things like fingerprint scanning.

(3:56 - 4:02)
Oh, wow. So they know it's really you taking the test. So you go to this testing center and they're like... It's fingerprint.

(4:02 - 4:05)
Fingerprint. It's like a spy movie. Basically.

(4:06 - 4:13)
It's all to make sure that everyone's on a level playing field. Yeah. And that the scores actually, you know, mean something.

(4:13 - 4:16)
That actually makes me feel better. Yeah. It's good to know these things are in place.

(4:16 - 4:21)
Okay. So we've got the secure testing centers. We've got these like very thought out questions.

(4:23 - 4:34)
But then what about the actual scores? How do you compare? Like, if you're taking different versions, how do you compare those scores? Right. Especially if they're different questions you were saying. Exactly.

(4:34 - 4:45)
And that's where equating comes in. Okay. So equating is how they make sure that a 25 on one version of the TOEFL means the same thing as a 25 on a different version of the TOEFL.

(4:46 - 4:47)
Okay. Even if the questions are different. Okay.

(4:48 - 4:52)
So that makes sense for like multiple choice. Like you got it right. You got it wrong.

(4:52 - 4:56)
Right. But what about like speaking and writing? Because that's a person. Right.

(4:56 - 5:01)
Yeah. Grading you. So how do you equate that? So you've hit on like the big challenge.

(5:02 - 5:10)
Okay. Of making sure the TOEFL scores are really comparable across the board. Speaking and writing, much trickier.

(5:10 - 5:15)
Much much trickier. Because we can't just use, we can't just keep using the same questions over and over again. Right.

(5:15 - 5:18)
Because then people will share them. Exactly. Yeah.

(5:18 - 5:23)
It's a security risk. People could memorize answers and then the whole system falls apart. Right.

(5:23 - 5:38)
So we have to change the questions, which means we have to figure out how to compare scores even when the questions are different. So what do they do? So for the speaking and writing, it's pretty fascinating. They use a really intense scoring process.

(5:38 - 5:45)
Okay. Involving multiple raters and some very very fancy statistical methods. Okay.

(5:45 - 5:51)
To minimize the differences between those different raters. Okay. And make sure everyone is grading to the same standard.

(5:52 - 6:00)
So you're saying that like if I take the TOEFL, my speaking and writing isn't just graded by one person. Exactly. That is a great, you're catching on.

(6:00 - 6:07)
Each response is actually graded by a human and an AI scoring system. Hold on, hold on. I know.

(6:07 - 6:12)
AI graders. Yes. So a computer is listening to me talk and judging my English.

(6:12 - 6:26)
Well, I don't know about judging, but it's analyzing it. Things like how fluent you are, what kind of words you're using, your grammar, all those core speaking skills. And it's trying to be as objective as possible.

(6:26 - 6:40)
Really focused on what the language actually demonstrates about your skill level. But how do they know that the AI is as good as a person? That's a great question. And that is something they have researched a lot.

(6:40 - 6:59)
They've done tons of studies comparing the scores from the AI and from human raters. And what they're finding is actually really interesting. So they found that for certain tasks, if you look at the independent writing task, the AI actually predicts how well you'll write in the future even better than a human reader could.

(6:59 - 7:05)
Oh, wow. Yeah. So like it can actually tell how good of a writer you're going to be.

(7:05 - 7:11)
So it's like better than a teacher, basically. Yeah. Well, for that particular task, it seems like it.

(7:11 - 7:20)
Now, in fairness, it was just that one kind of writing. And for other ones, it's, you know, the AI and humans are pretty much the same. But it is still, it's a very interesting finding.

(7:20 - 7:26)
It's very interesting. Makes you think about, you know, how we even measure these things. Yeah, mean to be good at English.

(7:27 - 7:31)
So they're using AI for the speaking part too. Exactly. Yeah.

(7:31 - 7:43)
So the system for speaking is called SpeechRater. And it listens to your responses. And it's, again, looking at those the same core elements, the fluency, the vocabulary, grammar.

(7:43 - 7:52)
So I guess as long as it's like fair and accurate, then I shouldn't be scared of a computer listening to me. Yeah. That's I think that's the that's the whole point.

(7:52 - 7:59)
Right. It's not that we're trying to get rid of human judgment completely. But it's more like, how can we combine the best of both? OK.

(7:59 - 8:04)
So AI is great at being consistent. Right. Because it's going to be the same every time.

(8:04 - 8:13)
Exactly. No matter what. But human raters, we bring our own experience, you know, our own understanding of language and how it works.

(8:13 - 8:21)
And those things together seem to give us the best, most reliable assessments. Like it checks and balances. It's a safety net.

(8:21 - 8:28)
Yeah. You've got the AI to make sure everything's like, you know, on the up and up. But then you have an actual person being like, hold on, let me just double check this.

(8:28 - 8:33)
Exactly. Exactly. And it's not like they just turn the AI loose and, you know, forget about it.

(8:33 - 8:42)
They're constantly monitoring how it's doing, you know, making tweaks, making sure it's still doing its job properly. OK. Yeah.

(8:43 - 8:52)
Can we go back to the equating thing for a second? Yes. Of course. So we talked about how it works with multiple choice, but I'm still a little fuzzy on the speaking and writing.

(8:52 - 8:59)
So like, what about, what about Juana and Lina? Oh, right. Who got the same score on the multiple choice, but they took different tests. Juana and Lina.

(8:59 - 9:06)
Yes. They're a perfect example of why this equating is so important, especially with different test forms. Yeah.

(9:06 - 9:13)
So, yeah, they're raw scores. The number they got right were the same, but that didn't actually mean their English was the same. Yeah.

(9:13 - 9:16)
Because they took different tests. Exactly. So that they can enter scaling.

(9:17 - 9:27)
So scaling is basically they take those raw scores. They make some adjustments based on how hard that particular version of the test was. OK.

(9:27 - 9:42)
And they convert those into the scaled scores, which is...
"""Here is the last and final part of the transcript. Unless otherwise specified, your response should be in the same language as the transcript. As a reminder, here are your instructions which you must follow with this final part of the transcript:

use this webinar transcript to create the episode description for buzzsprout; the title is Reliability and Comparability of TOEFL iBT® Scores

Here is the final part of the transcript, which you should use to complete your task.

"""
... what you see on the score report. You know, the zero to 30 for each section, and then the total score out of 120. So it's like you're taking the different tests and making them all the same.

(9:42 - 9:44)
You got it. That's a great way to put it. OK.

(9:44 - 9:52)
So no matter which version you take, you're being measured on the same scale. This is so much more complex than I even thought about. Right.

(9:52 - 9:56)
It's a whole process. I had no idea all this went into a TOEFL score. Yeah.

(9:56 - 10:11)
It's pretty amazing when you start to, like, really peel back the layers and see what's going on. So what would you say is, like, the main takeaway here for, like, our listener who's probably like, whoa, this is a lot. It is a lot.

(10:12 - 10:21)
I think the biggest thing to remember is that when you see a TOEFL score, it's not just some random number. Right. A lot of work went into that.

(10:21 - 10:26)
OK. Like, there was research, there was analysis, there was, you know, statistical modeling. Wow.

(10:27 - 10:31)
All to make sure that that score really means something. So I can trust that. Yeah.

(10:31 - 10:37)
That that number is accurate. Yeah. That it's a fair and accurate reflection of that person's English.

(10:37 - 10:39)
OK. Good to know. Right? Good to know.

(10:40 - 10:47)
So if you're out there and you're getting ready to take the TOEFL, just know that they're taking it seriously. Yes. So, good luck.

(10:47 - 10:51)
And maybe next time you hear standardized tests, you'll think twice. Yeah. It's not.

(10:51 - 10:54)
It's not a simple. It's not simple. It's not just like a Scantron anymore.

(10:54 - 10:56)
No, no, not at all. Yeah. Definitely not simple.

(10:56 - 11:08)
Yeah. So we talked about the A.I. for like the writing and the speaking, but what about the rest of it? Like the the reading and the listening on the TOEFL? So those are a little more straightforward, I guess you could say. OK.

(11:08 - 11:10)
Because it's multiple choice. Right. Right.

(11:10 - 11:23)
But they still use a lot of, you know, really careful statistical analysis to make sure that the questions are fair, the difficulty level is consistent. Right. From one version of the test to another.

(11:23 - 11:33)
So it's kind of like that equating thing. Exactly. It's all about making sure that no matter which version of the test you're taking, it's still measuring you on the same on the same scale.

(11:33 - 11:40)
Gotcha. Gotcha. You know, it's funny, I always thought about these standardized tests as such a like a black box.

(11:40 - 11:45)
Yeah. You put your answers in and then a score pops out. But you never really know what like.

(11:45 - 11:50)
What's happening. What happens in between. But like now I'm like, oh, there are people.

(11:50 - 11:54)
There's a lot going on. There's a lot going on in there. More than meets the eye.

(11:54 - 12:00)
Yes. And it's not just like random. It's very, it's very purposeful.

(12:00 - 12:11)
A lot of research goes into these, a lot of development to make sure that those scores are actually meaningful and that you can rely on them. It makes you really appreciate all the work that goes into like making a test. Yeah.

(12:11 - 12:21)
Absolutely. It's really important because it's like, you know, it's it's your future. These scores, they they have real impact, right? I mean, it can determine whether or not you get into a certain school or get a job.

(12:21 - 12:30)
So it's really important that they're right. So to everyone out there who's, you know, maybe getting ready to take the TOEFL or any standardized test. Yeah.

(12:30 - 12:33)
Good luck. Your score is more than just a number. Absolutely.

(12:34 - 12:39)
Like you put in the work, you have the skills. It reflects something real. Go out there and do great.

(12:39 - 12:50)
And remember, there's a whole like team of people and AI rooting for you to get the score that you, you know, you've been working towards. That's a great, great note to end on. Yeah.

(12:51 - 13:01)
Well, thanks for thanks for joining me on this deep dive into the world of standardized testing. Who knew? It's been a pleasure. I'm sure we'll have you back to to dive into something else soon.

(13:01 - 13:04)
I would love that. Awesome. Until next time, keep those brains buzzing.

People on this episode