Using REGEXEXTRACT To Pull The Exact Data You Need

If you’ve used Excel with data exported from a CRM, you’ll know it doesn’t ever come the way you want it to!

Order references get mixed up with descriptions, IDs get buried inside notes, and codes sit inside brackets or comments.

When your data comes like this, you have to spend time cleaning it – if you have a weekly or even daily report to do with it, that’s a lot of time you’ll be spending tidying up!

But even worse, these issues aren’t always consistent with each other, sometimes there are no notes alongside entries with them!

The challenge then becomes cleaning the data quickly, and finding a way to get the same result from different pieces of data.

How Regex Fixes This

Regex works differently to the classic Excel text functions, it has its own rules and language, which you can learn about with our regex in Excel article. All you need to know for this guide, is that it doesn’t rely on fixed positions, exact lengths or consistent formats, it works off a description of the data.

You don’t tell Excel where something is, but what it looks like. Then Excel goes off and finds it wherever it exists, and ignores everything else – perfect for our task!

This is exactly why regex works so well with CRM exports. Even if:

  • Notes are in some rows and not others
  • Formats vary between entries
  • Extra text appears before or after the value you need

You can just apply one rule and get a consistent result.

A header image with

REGEXEXTRACT Syntax

So regex is great, but what about the actual functions? Which do we want to use and where? The function itself is intentionally minimal:

=REGEXEXTRACT(text, pattern)

Text is just the text we are looking through.

The most important thing to look at here is the pattern. Unlike the FIND function, which locates a specific text, we are defining a rule for it to work with. So what does that look like?

Example: Extracting Reference Tags

So, enough explanation, let’s go through an example from our City of London Excel workshops – a real use of regex with REGEXEXTRACT.

Let’s say you are working with an export from a CRM that is entirely unorganised, and your job is to extract the reference tags from a few cells of data that are all different.

Here is the sample data we are working from:

Follow-up required – Ref #URGENT today
User access restored – Ref #RESET 
Account flagged for review – Reference #HOLD thursday

And from this, you need Excel to output:

URGENT
RESET
HOLD

You can use:

=REGEXEXTRACT(A2, “#([A-Z]+)”)

Shows REGEXEXTRACT cleaning up messy data

The cell reference is A2, to scan the first piece of data, and the pattern is “#([A-Z]+)”. Here, the hashtag is being used at the start, as that’s where the reference tags are within the data. So the function opens up the data, goes to the hashtag, then captures [A-Z], which is the set of all uppercase characters, exactly what we need!

Excel searches A2 for the uppercase text after the hashtag and returns it – ignoring everything else. No matter how messy your CRM data is, what comes before and after the text, you’ll always get the reference tag.

What If There Isn’t A Match?

So regex can return a result with really messy data, but what if there just isn’t any result in the cell you’re looking at?

For example, what if our sample data in the previous section looked like this:

Follow-up required – Ref #URGENT today
User access restored – Ref #RESET 
Account flagged for review

No hashtag, no reference tag.

Well REGEXEXTRACT would give you an error. You can audit the formula which will tell you that the rule didn’t apply to the data you gave it, but for reporting we don’t really want to see an error! To handle this, just wrap the formula in IFERROR.

=IFERROR(REGEXEXTRACT(A2, “#([A-Z]+)”), “No reference tag”)

Shows REGEXEXTRACT wrapped with an IFERROR cleaning up data even when a match isn't found

Now if a row doesn’t contain a reference tag, we’ll get “No reference tag” in that cell, instead of an error – much better if you want to present this data. This then makes the formula totally safe to drag down a large dataset, as you can see where things went wrong without having errors all over the place.

Using REGEXEXTRACT As A Building Block

REGEXEXTRACT is great, but it’s usually just a part of the process. Think of it like the first in a series of building blocks. Once you have a strong foundation, you can build upon it easily – once the logic is sound you don’t have to worry about it afterwards.

Once you’ve extracted clean reference tags, you could:

  • Categorise records based on the tag

  • Filter or group reports more easily

  • Apply conditional formatting

Instead of fighting the original text over and over again, you solve the problem once and reuse the result everywhere else.

Conclusion

Messy data and CRM exports are (sadly!) not a one-off problem, usually you have to deal with them constantly. Maybe it’s every month, but for some people it’s every week – or even every day!

Excel has always given its users tools to clean up this data, but with the introduction of regex – it’s more efficient than ever before. REGEXEXTRACT can replace a whole group of nested formulas, and gives you one quick function to build on. Functions being stacked on top of each other is what causes a lot of performance issues, so having just one function to extract the data you need is a great start.

If you define the logic well, you can stop spending so much of your time fixing the same problems every week, and get back to actually analysing your data.

About Ben Richardson

Ben Richardson is the Director of Acuity Training, and has been leading the company for more than 10 years.
He is a Natural Sciences graduate from the University of Cambridge and a qualified accountant with the ICAEW, bringing a strong analytical and technical background to his writing.
He previously worked as a venture capitalist and banker, gaining extensive experience with Excel from building financial models and later expanded into SQL, Power BI and other data technologies.
His writing is centred around real-world examples, helping readers understand not just how tools work, but how they can be applied to day-to-day work.