Connect – Microsoft Dynamics 365

 

How to perform c# regex multiple replacements

Sep 24, 2019

By Jeff Ballard

As I was testing the XML request for a web service, I would periodically get this response back: [400]: Error parsing xml.   

After some troubleshooting, I discovered that the error was caused by a few of the Predefined XML Entities included in the actual data. The most common issue was the ampersand "&" being embedded in the address and account data.

I needed a way to replace those xml entity characters, but didn't want to replace each one separately, so I thought I'd just use a Regular Expression.

C# regex replace multiple matches

The Regex.Replace method has four overloads, but the basic syntax in .NET is Regex.Replace(string input, string pattern, string replacement). That didn't work for me because I needed the replacement value to vary, based on the pattern.

For example, this simple replacement Regex.Replace (input, "&""&amp;") wasn't what I wanted, as that would mean repeating it for the other replacements. Now the pattern can include multiple items, for example Regex.Replace (input, "&|"|<|>|'"replacement), but the replacement would be the same for each one, so I needed to use the overload that accepts a MatchEvaluatordelegate that is called each time a regular expression match is found during a Replace operation.

Regex.Replace(string input, string pattern, MatchEvaluator evaluator)

Given that my search and replace values were simple, I decided to use a generic Dictionary and store the entity and its replacement as name value pairs.

var xmlEntityReplacements = new Dictionary<stringstring> { 
    { 
"&""&amp;" },
    { 
"'""&apos;" },
    { 
"<""&lt;" },
    { 
">""&gt;" },
    { 
""""&quot;" }
};


Great, now I can revised the Regex like this:
Regex.Replace(input, "&|"|<|>|'"delegate(Match m)
return xmlEntityReplacements[m.Value]; })


Not bad, but I am duplicating the items in the pattern (they are already in the dictionary), and why use the anonymous method approach when I'm using .NET 3.5 and have access to LINQ?  

So, a little refactoring was in order.

I first needed to get the dictionary keys and return them as a pipe-delimited string. There is a CopyTo() method on the collection to convert the keys or values to an array, and from the array I can get the string I needed. That's good, but I don't want to have to declare and array to house the values and then use CopyTo; that's an extra step. It proved another great use for LINQ and its extension methods:

Regex.Replace(source, string.Join("|", xmlEntityReplacements.Keys
.Select(k => k.ToString()).ToArray()), m => xmlEntityReplacements[m.Value])


Here's the code converted to a method:

/// <summary>
///
 Replaces the 5 predefined entities in XML with the appropriate name /// escape character.
/// </summary>
///
 <param name="source">string to search</param>
///
 <returns>source string with replaced value or original string</returns>
public static string ReplaceXmlEntity(string source)
{
    if(string.IsNullOrEmpty(source)) return source;

    // The XML specification defines five "predefined entities" representing
    // special characters, and requires that all XML processors honor them. 
    var xmlEntityReplacements = new Dictionary<stringstring> { 
    { 
"&""&amp;" },
        { 
"'""&apos;" },
        { 
"<""&lt;" },
        { 
">""&gt;" },
        { 
""""&quot;" }
    };

    // Create an array and populate from the dictionary keys, then convert
    // the array to a pipe-delimited string to serve as the regex search
    // values and replace
    return Regex.Replace(source, string.Join("|", xmlEntityReplacements.Keys
.Select(k => k.ToString()).ToArray()), m => xmlEntityReplacements[m.Value]);
}


Note that for more complex patterns/searches, the dictionary approach won't work because the MatchEvaluator would expect to find the match as a key in the Dictionary and it wouldn't be there. For example, let's say your match pattern was "T.p" (T followed by any character, then a lowercase p), the replacement value was "Top" and the input string was "Just Tap and Tip." The dictionary would have a Key of "T.p" and a Value of "Top"  Now, the pattern "T.p" is going to match both "Tap" and "Tip" but those words aren't keys in the dictionary and you'd get the error "The given key was not present in the dictionary."

I'll leave that as an exercise for you to do if you so choose.

If you’d like to learn more about c# regex replace multiple matches, contact the technology consultants at Wipfli. You can also keep reading more of our technology-focused articles here.

Comments

*User Name field is required.

(will not be published)

*Real Name field is required.

(will not be published)

*A valid email is required.

*Company field is required.

*Comment field is required.
Does your team need help with CRM? Contact us: