Why the GCP Professional Data Engineer Exam is so difficult (and how to pass)

Why the GCP Professional Data Engineer Exam is so difficult (and how to pass)

Ben Makansi Ben Makansi
8 minute read

The GCP Professional Data Engineer Certification Exam is notorious for being one of the most difficult cloud certification exams. I’ve passed this exam 3 times now, both to add the certification to my belt as a professional credential, and also to motivate my learning with a clear and challenging goal.

During my first attempt, with a bit of strategy, I was able to pass it after two months of serious studying. And since then, I’ve learned some things about what works and what doesn’t, which I’ll now share.

After I passed the exam for the first time, I was very glad I undertook the challenge. Preparing for the exam taught me much about data, about GCP, and about many computer science concepts I hadn’t previously learned. However, as I prepared, I noticed that many of the resources and approaches that people adopt to study for the exam are incongruous with what it really tests. This was corroborated later by the confusion and frustration I noticed in friends and online forum members who struggled to prepare. So I thought it could be helpful to write a few of my thoughts on an effective approach.

What makes the Professional Data Engineer exam so difficult

One fact must be drilled into your brain if you are to adequately prepare for the exam; the Google Cloud Professional Data Engineer Exam measures your ability to pick specific GCP solutions to a wide variety of problems.

This means that almost every question will describe a different business scenario, with a set of goals and solution requirements, and you have to choose how to use which combination of tools in the solution.

Here’s an example (that I just made up):

You work for a large trucking company with stations across North and South America. Your COO needs a real-time dashboard displaying time-series analytics from the trucks' GPS devices. The data must be highly available within the United States, regardless of origin, and the dashboard is needed as soon as possible.

A. Ingest with Apache Kafka, store the data in a new Cloud Bigtable instance with the "us-east1" region, and feed a Python App Engine dashboard.

B. Ingest with IoT Core and Pub/Sub, store the data in a BigQuery dataset with the "US" multi-region location, and feed a Looker Studio dashboard.

C. Ingest and perform in-flight analytics with Dataflow, store in a multi-region Cloud Storage bucket, and feed a Looker Studio dashboard.

D. Ingest with Pub/Sub to Cloud Spanner and set up the relevant metrics and dashboards in Cloud Monitoring.

This is a pretty basic example, and even this question requires understanding of at least 7 underlying concepts/tools and how they piece together to form a solution. So why do so many GCP courses fail to teach you how to answer questions like this?

What’s wrong with most online GCP courses

The problem is that they teach information, not understanding.

If you search for online GCP Professional Data Engineer courses, and they will probably showcase a remarkable breadth of topics, giving you the impression that this is your one-stop-shop to all your exam prep needs. However, that is not what is relevant to pass the exam.

Taking the exam after learning massive amounts of information about Google Cloud Platform is like trying to write a song after taking a comprehensive course on all of the notes, chords, scales and arpeggios of the piano. What matters more is not your understanding of independent facts, but your ability to put the pieces together to create something that has meaning to a given context.

This is precisely why Google states:

“The best way to prepare for the exam is to be competent in the skills required of the job.”

It tests your ability to intuit and deduce the right way to use the correct combination of tools for a solution.

This means that you cannot simply learn facts and concepts about different tools on GCP and expect to pass. However, on the other hand, the great variety of business scenarios posed on the exam means that you also cannot understand deeply a few common use cases and expect that understanding to generalize.

This is precisely what makes studying for the GCP Professional Data Engineer Certification exam so difficult. The questions are highly varied, so you need a generalizable understanding of the domain. However, each question requires a specific solution to a very specific business scenario, so you also need practice deducing specific solutions from your generalized understanding.

So how do we devise a study strategy that achieves both?

In short, you must embark on a journey of distilling prep materials down to the most relevant skills and concepts for the exam, then consume that material in a way that continuously gives you clear feedback until you’re ready to take the exam.

You should quickly gain a basic understanding of GCP’s services and cloud data engineering concepts. Depending on your background, this may or may not require an online course. Essentially, you should know what all of the tools on GCP are and what their role is in the data lifecycle:

  • BigQuery

  • Cloud Storage

  • Dataflow

  • Dataproc

  • Bigtable

  • Pub/Sub

  • App Engine

  • Cloud Composer

…and so on. If you already use GCP, you can probably acquire this breadth of knowledge by augmenting your understanding with thoughtful articles and GCP documentation. If you’re completely new to GCP, you should seriously consider investing in an online course. Keeping in mind what I said above, however, you should be very mindful of how well a given course actually prepares you for the test.

Then, as you study, pay attention to the underlying concepts that are relevant to building GCP solutions.

One of the best ways to do this is to take realistic practice exams.

The very first thing I did after deciding to study for the exam for the first time was I took Google’s official practice exam completely cold. It was painful and I got a 50%. But I really thought about each question before answering, and I’m convinced that taking the official practice exam first helped me a lot. Why? It created a roadmap in my mind for which concepts and problem-solving approaches were relevant. Thus, as I read and consumed material from other resources over the subsequent weeks, I was able to learn more and focus on what was important.

The point of this step is that once you have a cursory understanding of the tools on GCP, you should seek out resources that tell you more about what is relevant to devise GCP solutions to business problems.

As you gain a greater understanding, abstract way the important concepts into notes and flashcards that can be reviewed and drilled.

After each practice exam, I revisited the questions that I got wrong, but I did not simply copy the question and answer to a flashcard. Instead, I wondered: what underlying concept would have allowed me to answer not just this question, but similar questions with the details modified? That was what I used to create the flashcard. Sometimes, one question required multiple underlying concepts, and therefore the creation of multiple flashcards.

The point of this step is to capture the underlying, generalizable concept that you can use, regardless of the specifics of the question. Then these flashcards can serve as a way to easily drill the important concepts so that they are clear and available in one’s mind.

Drill the flashcards and be willing to continue to modify them as you realize they don’t quite capture the important concepts.

Pretty self-explanatory. Keep revisiting your flashcards, changing them or adding to the stack if necessary. Ideally, visit with greater frequency the flashcards that give you trouble.

It’s okay to retake practice exams with time in between, but make sure you didn’t simply memorize the questions.

And before too long, you will be ready to pass the Google Cloud Professional Data Engineer exam.

The overarching strategy

The essence of this strategy is to move as efficiently as possible to identifying the highly-relevant patterns and concepts, and understanding them extremely well.

This requires a minimum amount of breadth to even understand what’s being talked about.

Then, you must subject your brain to the type of thinking required of the exam so you can abstract the relevant concepts that can be applied to a variety of problems.

Given this dense pool of information in whose relevance you can now be confident, you drill efficiently and focus on your weaknesses.

You continue this strategy, possibly revisiting previous steps as needed, until you pass realistic practice exams with about 85-90% accuracy.

So beware of courses that contravene this approach. They often give the appearance of thoroughness, but that doesn’t necessarily mean they will adequately prepare you to pass the exam.

Hope this helped.

If you're interested in passing the Professional Data Engineer exam, you can check out my course, which I personally constructed based on my experience:

https://www.gcpstudyhub.com/courses/google-cloud-certified-professional-data-engineer

« Back to Blog