Generative AI Leader Practice Question: Understanding Data Types for Machine Learning

Ben Makansi Ben Makansi
3 minute read

Let’s go over an example question that could appear on Google Cloud’s new Generative AI Leader exam, which was released in May 2025.

If you want more preparation help like this, you can check out my Generative AI Leader practice exams.


The Question

A university is developing a machine learning system to automatically sort student essays into categories such as “History,” “Biology,” and “Literature.” Faculty members have already reviewed the essays and assigned each one to the correct subject area. What type of data do these categorized essays represent?

A. Raw data

B. Structured data

C. Labeled data

D. Unlabeled data


Explanation

So, when preparing for machine learning projects, you have to understand what kind of data you are working with. Even though the Generative AI Leader exam is a foundational level certification, this is a skill that they test you as well.

Data can come in different forms, and each type has implications for how you can use it. In the case of this university, faculty members have already read the essays and tagged them with subject areas like History or Biology. This pairing of the essay (input) with its subject category (output) makes the dataset useful for supervised learning, where models learn by mapping inputs to outputs. This is literally the definition of labeled data, and it’s what allows the university to train a model that can classify new essays into the right categories.

So this is a pretty straightforward question, and our correct answer is:

C. Labeled data

Let's still just go through the other answer choices. They reflect distinct categories of data that don't apply. Raw data refers to information in its original state, before any organization or labeling. Structured data is information organized into fixed fields such as numbers or dates in a database, which is very different from free-text essays. Unlabeled data describes content that has no tags or categories, which is not the case here, and would mean it cannot be used for supervised learning until labels are added. Since the faculty have already done the work of assigning categories, the dataset is not raw, not structured, and not unlabeled. The data in the question is clearly labeled data.

When you step back, the lesson is that understanding data types is fundamental to recognizing how AI systems are trained. Knowing the distinction between labeled and unlabeled data is especially important for supervised learning use cases, since only labeled datasets provide the input-output mapping needed to teach a model.


More practice

I’ve got more Generative AI Leader practice questions that you can review to make sure you pass the real exam.

« Back to Blog