When you get raw Excel data, you sometimes find that text, numbers and non-alphanumeric characters are all mixed up together, and you need to remove the non-alphanumeric characters before you can work with the data.
One way of removing non-alphanumeric characters would be to go through the dataset cell by cell and delete them, but this method is not practical, especially when dealing with large datasets.
This tutorial presents a few approaches you can use to easily remove unwanted non-alphanumeric characters from the dataset.
Two methods for removing non-alphanumeric characters in Excel
The following two methods are some of the approaches you can use to remove non-alphanumeric characters in Excel:
METHOD 1 – Use Excel’s SUBSTITUTE function
If your dataset contains only one type of non-alphanumeric character, you can use the SUBSTITUTE function to replace them with empty strings.
Here’s an example of a dataset with a single non-alphanumeric character type, the asterisk (*):
We’ll use the SUBSTITUTE function to replace asterisks (*) with empty strings:
Step 1- Enter the formula =SUBSTITUTE(A2, ” * “,” “) in cell B2:
Step 2 – Press Enter :
Step 3 – Use the fill handle in cell B2 to drag the formula down and copy it to the rest of the empty cells in the dataset:
All asterisks (*) have been replaced by empty strings.
Nested SUBSTITUTE functions
The SUBSTITUTE function cannot replace more than one non-alphanumeric character at a time.
If your dataset contains two or more non-alphanumeric character types, you can use at least two nested SUBSTITUTE functions to delete them.
For example, if you wish to delete 2 non-alphanumeric character types, use 2 nested SUBSTITUTE functions, and if you wish to delete 3 types, use 3 nested SUBSTITUTE functions.
The following data set contains two types of non-alphanumeric characters: the asterisk (*) and the pound sign (#) :
We’ll use 2 nested SUBSTITUTE functions to remove the non-alphanumeric characters in the following steps:
Step 1 – Enter the formula =SUBSTITUTE(SUBSTITUTE(A2, ” * “,” “), ” # “,” “) in cell B2:
Step 2 – Press Enter and drag the fill handle down to fill the empty cells with the formula:
Non-alphanumeric characters have been replaced by empty strings.
You can continue to add more and more nested SUBSTITUTE functions to your formula, depending on how many non-alphanumeric characters you wish to remove. But the more SUBSTITUTE functions you add, the heavier your formula becomes.
A more practical way of removing all non-alphanumeric characters at once is to use user-defined functions.
See also: Using Excel VLOOKUP to return multiple values vertically
METHOD 2 – Use a user-defined function
To remove all non-alphanumeric characters from your dataset, you need to write an Excel VBA macro using the following steps:
Step 1- In the Developer tab, click on the Visual Basic command or press Alt + F8:
TheVisual Basic Editor (VBE) window appears.
Step 2 – Click on Insert>>Module to create a new module:
A new module will be created:
Step 3 – In the new module, enter the following code and click on the Save button:
Function RemoveNonAlphaN(str As String) As String Dim ch, bytes() As Byte: bytes = str For Each ch in bytes If Chr(ch) Like "[A-Z.a-z 0-9] " _ Then RemoveNonAlphaN = RemoveNonAlphaN & Chr(ch) Next ch End Function
Step 4 – Click on the View Microsoft Excel button or press the Alt + F11 keyboard shortcut to return to the current worksheet. Enter the formula =RemoveNonAlphaN(A2) in cell B2 of the dataset we’re using as an example:
Step 5 – Enter and drag down using the fill handle:
All non-alphanumeric characters will be removed.
Conclusion
In this tutorial, we’ve presented two methods you can use to remove non-alphanumeric characters from your dataset. You can use whichever one suits you best.