Blog

How to build a predictive model in higher education

You may be farther along than you think

April 7, 2022, By James Cousins, Senior Strategic Leader, Data and Analytics

Topic

Finance and Operations

Tag

If you’re planning to use predictive modeling in higher education for admissions, enrollment, student success, or any other institutional initiatives, you’ve likely come across lengthy list of what you need to do to prepare. But these rarely give you credit for what you’re already going with your data.

If you use data extracts in any capacity at all —from your SIS or LMS, internal or external reporting, or even ad hoc data requests—you’re already preparing to build a foundation for predictive modeling.

With just a few extra steps, your institution can leverage predictive analytics without overhauling existing processes.

In my direct experience across dozens of institutions, any institution with even a basic use of data is already accomplishing at least the first two steps. Furthermore, many institutions are doing the lion’s share of the third step (cleaning data) and the fourth (creating new variables). Most newcomers to predictive modeling only need to tackle steps 5 and 6, which are easier to achieve than you might think.

Why predictive modeling matters in higher education

Predictive modeling matters in higher education because it helps institutions make data-driven decisions that improve enrollment, retention, and more. Whether it’s forecasting application volume, identifying students in need of support, or improving degree completions, knowing how to build a predictive model can transform your institution’s ability to make data-informed decisions.

6 steps to build a predictive model in higher education

1. Collect data relevant to your target of analysis

In today’s world, you almost can’t help but have data on your topic of interest. Whether it’s student registration, application data, or newer systems like mobile apps and event registration platforms, institutions are constantly collecting data. I could go on, but the point is that you don’t have to “do” it —it happens.

Challenge: It’s worth acknowledging that the accessibility of the collected data may be an issue. For instance, local IT resources may gate your student database, or the vendor may host on a remote, secure server. You can occasionally alleviate barriers through partnerships, proof-of-concept projects or technology.

2. Organize data into a single dataset

Organizing huge swaths of disparate data can be a complex, time-consuming element of the overall project. Therefore, it behooves you to focus on a core set of variables for initial passes.

Option Area 1: If you have the luxury of time, adding supplemental fields can help immensely —but you may be surprised to discover how much a model can tell you based on only the basic information already synthesized in your information systems.

Option Area 2: Student information systems (SIS) and data warehouses like EAB’s Edify frequently store information on prospective student enrollment, student success, or on-time completion in the same table or just a few different tables. What’s more—and this is the real a-ha moment—you’re probably already merging many of those tables and extracts to accomplish your required reporting and answer ad-hoc questions. In other words, you’re probably completing this step for other reasons anyway.

3. Clean your data to avoid a misleading model

Your institution is likely cleaning data to some extent, but modeling may introduce a need for wider-reaching data cleaning to ensure accuracy. The fields directly included in reports and dashboards across campus are likely to be in good shape. Thus, you might need to start exploring fields you don’t report on, and those might not already be part of an existing cleaning process.

For example, imagine you have a hypothesis that digital and campus transaction data (e.g., dining hall usage or book-store purchases) is predictive of successful student outcomes. The data might need cleaning and formatting before it can easily link to students’ other data.

Newcomers to predictive modeling should still take heart, though. Data preparation may be 80% of the work in a modeling project, but much of the data you’ll be relying on is cleaned for other end-uses, and what remains is only a fraction of the total.

Download A Data Quality Checklist

4. Create new, useful variables to understand your records

My claim that you’re probably already creating new variables is based on institutions’ perpetual efforts to create and refine useful reports. These variables can enhance predictive modeling.

Example: Consider a flag for a student taking a lab-science course (or any other required course). I once worked at an institution that required all students to take at least two lab-science courses, which were historically the most constrained for capacity. To ensure that students weren’t working themselves into a bottleneck where they couldn’t find an open lab-science course, we created a new variable—a flag for any such course—and tracked it. That helped us to recognize which students still needed their lab-science courses and encourage them to fit them in.

Here’s the turn, though—that same flag is a stellar candidate for a predictive model targeting retention or on-time completion! The lab-science flag is very specific, but you (or someone else at your institution) may have your own custom creations to bring into a model.

Another redeeming quality about this step? It can be genuinely fun. Thinking of new variables to predict a critical outcome is a creative process. Data analysis involves plenty of mechanical, objective tasks—subjective, creative, and contextual problems like “What else might help us understand this outcome?” are gems.

If you don’t have time for this step in your first pass at modeling, don’t worry—you can build a model without excesses of new variables, and revisit this step in successive iterations.

5. Choose a methodology/algorithm

There is a wide world to explore once you start learning more about building a predictive model and choosing the right algorithm. At the same time, it can be surprisingly easy to enter this phase because there are droves of resources available.

Where to Start: While there’s no one best source for all people to learn any given concept, I recommend that you start by searching through higher education analytics forums (AIR Forums, NACAC, NASPA, and other consortia specific to your focus).

It’s tempting to start with statistics-first, use-case-second-style sources, but that may leave you inundated with information irrelevant to your intended usage. In finding use cases and references to methodologies in peer-reviewed locations, there is an effective guarantee that the methodology or algorithm you discover is proven.

From there, you can explore less use-specific sources for knowledge, like YouTube compilation videos, StackOverflow, and data science blogs.

6. Build the model with the right tools

Everyone who builds predictive models today uses an application to do it, whether it’s open-source, a licensed software, or a homegrown tool. So, when you hear about advanced algorithms or read blog posts that reference dozens of steps, don’t fall under the impression that you will need to perform them manually.

Tools are the single-most influential enabler of predictive modeling in the recent past. The rapid development of statistical software has introduced an application designed for any user. Despite that, time-tested solutions exist, and using one with a track record can alleviate concerns you may have about modeling without an extensive background.

For that matter, while a background in predictive modeling will naturally benefit you when you’re building a predictive model, data analysis and its professionals are distinctly collaborative. Accomplished modelers are everywhere across the internet sharing their stories, caveats, and best practices. Even cursory searches for “how to” resources return a surprising variety of use cases, so there’s a great chance that you’ll find a resource that runs parallel to your needs.

How to Pick the Right Collaborators to Advance Your Data Strategy

Predictive model building may be net-new work, but it is within reach. I’ve had the fortune to support the implementation of Rapid Insight software in offices that made it abundantly clear how unfamiliar the practice of predictive modeling was to them. All the same, in mathematics, we stand on the shoulders of giants. The statistical theory behind predictive modeling is now (in many ways) automated through software, leaving it more accessible than ever before.

You’re closer than you think

Yes, predictive modeling involves a few steps you aren’t taking yet. However, the idea that you need to start from square one is a misconception. Predictive modeling is not the process of collecting, cleaning, organizing, or augmenting data. Instead, it is the process of analyzing data. That means that the data you have on hand right now is more ready than you might think for predictive modeling.

You can always find improvements by refining your data cleaning process, or the variety of fields you create to enhance your data. However, I hope that you take this away: to “get started” with predictive modeling, you need only slightly expand on the work you’ve already done.

James Cousins

Senior Strategic Leader, Data and Analytics

Read Bio

James Cousins's LinkedIn page

More Blogs

Blog

4 reasons your team won’t use data—even if they want to

Why do some potential data users not end up using the data available to them? This blog continues…

Data & Analytics Blog

Blog

3 reasons your colleagues don’t want to use data—and how to change their minds

If you're struggling to get colleagues to use the reports, dashboards, and datasets you're producing, read this post…

Data & Analytics Blog

Blog

Creating a data-informed campus: part 1

From tracking student success metrics to operational efficiency, data and analytics provide colleges and universities crucial information in…

Data & Analytics Blog