Regex in Excel & REGEXREPLACE

Microsoft has quietly been adding regular expression support to Excel, also known as regex.

Instead of building up a few different text functions on top of each other, you can now use regex inside Excel formulas to search for patterns, and clean data much more efficiently.

Our favourite of these new additions is REGEXREPLACE, which lets you find text based on patterns and replace data in one simple formula.

Before we get started with that, let’s break down what regex is, why it’s so useful, and exactly what it means for us as Excel users!

Before We Start – Update Excel!

These functions are pretty new, so if your Excel isn’t set up to automatically update, you might not have them.

Open up Excel, and type =REGEX into a cell, and make sure you see these functions:

Shows the new Regex functions in Excel

If you don’t then just click File >  Account > Update Options > Update Now.

Shows where to find the

So What Is Regex?

Regex is a sequence of characters that specifies a match pattern in text. In describes the actual patterns of text with its own unique language, rather than exact values. We teach this on our more advanced Excel courses in London because once you get the hang of it, it’s so much more flexible.

For example, it can find:

  • Any number, regardless of length
  • Everything inside of brackets
  • Text that starts with one character but ends with another
  • IDs, reference codes, or email addresses

It’s built around structure, not just the position of characters.

A Simple Regex Example

Imagine you have reference codes in your data like these:

INVA-48293

INVAB-10384

INVAC-92847

And that your goal is to pull out the numbers, and strip out the “INV-” segments. With old school Excel tools like the TRIM function, you would often rely on exact positions or lengths, which won’t work for everything in this data set since the lengths aren’t all the same.

With REGEXREPLACE, you can remove everything before the numbers and just describe the pattern you want to remove.

Shows a simple example of regexreplace cleaning up a dataset with different character lengths

How Does REGEXREPLACE Work?

We just saw an example of REGEXREPLACE cleaning up a varied dataset, so how exactly does it work?

It finds text based on the regex pattern that you specify and replaces it with something else.

The syntax is:

=REGEXREPLACE(text, pattern, replacement)

Text – The original text or cell of text you want to replace

Pattern – The regex pattern to search for within the text

Replacement – What to replace it with

So in the above example we took A2, which is INVA-48293 for text, “.*-” for the pattern, and “” for the replacement. We made the replacement blank, so it just removes everything that it finds within the pattern.

Let’s break down the pattern: “.*-“

The first part .* is a wildcard, and matches any characters at the start of the text, – matches the dash. So together, everything up to and including the dash will get replaced with nothing. Excel removes the entire prefix, leaving just the numeric part. That’s why it works even on this inconsistent data set, because regex has its own set of rules, including wildcards!

Advanced REGEXREPLACE Example

Let’s try working with another data set, this time trying to pull out the numbers, and change the country code to a number equivalent. Our sample data is:

INV-49283 (UK)

INV-10384 (EU)

INV-92847 (US)

And we want to strip out the number, then change the country codes so that (UK) becomes 01, (EU) becomes 02, and (US) becomes 03.

With the traditional Excel formulas, cleaning and analysing this data would be a pretty awkward process, as it has country codes which are also within brackets. With REGEXREPLACE, you can just strip out everything that isn’t a number. Type in the following, don’t worry, we’ll break it down!

=REGEXREPLACE(A2, “.*-([0-9]+).*UK.*”, “$1-01”)

So this time, our function reads what is in A2, then uses the wildcard again to strip away the INV- just like before. After that, you’ll see we have ([0-9)+). This specifies to regex that we only want characters from 0-9, which encapsulates all numbers.

So now Excel just has 49283, but we also want to change it based on the country code. We use another wildcard surrounding UK – *UK.* – then with the last parameter which is the replacement, we use “$1-01”.

Now the formula has stripped out the numbers, and found the country code. With this, it replaces the text it currently has, which is just 49283 with 49283-01. $1 is the container for the text we stripped out, so if you were to change it to $1-England, that would produce 49283-England! You can adjust this for the other country codes in your data set, and clean it up all with one REGEXREPLACE.

A more advanced example of regexreplace cleaning up data with country codes.

Thinking In Patterns, Not Positions

Some of the functions here might look a little intimidating, but they are incredibly powerful. The key here is to try and switch your thinking, so that you are looking for patterns in your text, instead of just positions.

Regex has loads of different syntax you can use, there’s a really great regex cheat sheet here so you can get some ideas! We recommend you give it a look, and just try out a few in your own spreadsheets, it’s definitely worth the effort.

With regex, Excel can now find data with more accuracy and flexibility than ever before. It’s great for data with:

  • Inconsistent prefixes or suffixes
  • Numbers that aren’t in defined positions
  • Extra characters that you want to adjust

All you have to do is build out the initial rule properly, and Excel will apply it consistently.

Conclusion

Regex actually comes from programming, and has been standard there for many years. It’s an amazing tool for data so we are very happy it finally landed in Excel!

With REGEXREPLACE, Excel users finally gain access to powerful text handling, and no longer need to use functions within functions, or apply several back to back in order to clean their data.

And alongside all this, your worksheets are easier than ever to read, so your colleagues won’t get lost in your process!

About Ben Richardson

Ben Richardson is the Director of Acuity Training, and has been leading the company for more than 10 years.
He is a Natural Sciences graduate from the University of Cambridge and a qualified accountant with the ICAEW, bringing a strong analytical and technical background to his writing.
He previously worked as a venture capitalist and banker, gaining extensive experience with Excel from building financial models and later expanded into SQL, Power BI and other data technologies.
His writing is centred around real-world examples, helping readers understand not just how tools work, but how they can be applied to day-to-day work.