.Net Regular Expressions Demystified Part 1

In very simple terms we can say that
"regular expression is a group of characters that defines a pattern" and using that pattern we find out specific information that we required in our case
So the regular expressions are nothing but a group of characters that have special meanings to regular expression engine which is already installed in .Net framework and represented by System.Text.RegularExpressions.Regex .

Goal of Article

My goal in this article is to give you a basic understanding of regular expressions in very short amount of time. I will guide you that enough so that you can create and use your own regular expressions in your .Net applications to meet your needs. At the end of this article you will be able to create your own regular expressions to match a standard phone number, social security number, email address, postal code etc.

Prerequisites

You must have a basic understanding of C# Language and some basic concepts of OOP (object-oriented programming).

Let’s Jump into Regular Expressions

Before I start discussing the syntax and regular expression engine let me answer some questions that you might have in your mind.


You might be confused like these guys, If you are confusing and thinking regular expressions are hard to learn then trust regular expressions are not that bad, in fact it's really very easy to learn and use them. Once you are up and running with regular expressions, I believe you will be doing more cool and fun stuff with regex and will be fully utilizing the power of .net regular expressions engine.

What Regular Expressions are?

As I told you earlier regular expressions are nothing but a string of special characters defining a pattern that further will be used in our c# program to extract out specific information from a large block of text.

Why We Need Regular Expressions?

As a .Net developer we work on different types of applications like web, mobile or desktop apps. Now there is one thing is common in all type of application is taking input from users. Users might intentionally or unintentionally put wrong input. Now it’s our duty as a developer to validate that input before we process the information and store that into database. Wrong inputs might crash our applications. So we need regular expressions to validate that input.That was the only one reason but there so many reasons why we use regular expression, it's because after all regular expressions provide us a powerful and fast way to manipulate and parse text. There is plus point for us because regular expressions syntax is same for all types of .net applications.

Uses of Regular Expressions.

There are many practical uses of regular expressions but let me tell you some common one’s
·    Regular expressions can be used to manipulate and validate user inputs.
·    Regular expressions can be used to replace, remove and pull out values from text input.
·    You can use regular expressions to parse Html document for taking out some specific data to store in database.
·    Regular Expressions might use to find out specific words or sentences in a large document instead reading the whole document.

Let’s Start Learning and Practicing

The best way to learn anything in the world is start practicing and getting your hand dirty into it before you completely learn it and at the end there is always something more to learn.

How Regular Expressions Work?

Regular expressions used to process text-based on regular expression engine that is already installed in .Net Framework and represented by System.Text.RegularExpressions.Regex.
Regular expression engine needs only two things to process texts.
1. The regular expression pattern that you defined to find text. (Don’t worry later in this article we will learn the syntax of regular expression).
2. And the second thing is the input text that we need to parse.


Basic Syntax

Now it’s time learn the basic syntax of .Net regular expressions so that we can create and use them in our C# programs.

Special Characters

As I told earlier regular expressions are a group of special characters with special meanings.
Here are some mostly used special characters are listed below in the table that I referenced from MSDN.

Special Characters
\b
Represents the position at the beginning and ending of the word.
\d
Represents any digit character.
\t
Represents a backspace character.
\n
Represent new line character.
\s
Represents any white space character.
.
Represents every character on same line.
\w
Represents any non-digit alphanumeric character.
^
Matches position at beginning of whole string.
$
Represents position at the end of whole string.

Before we see some more special characters let me explain you some simple examples where we will use above characters so that you can feel more friendly. Before all of that you need a basic understanding of Regex class and it methods.

Understanding Regex Class

Regex is a standard C# .Net class that used to represent regular expressions in .Net. We can easily say that Regex is used to represent an immutable regular expression, it’s because later we will see that Regex actually accepts a regular expression value in form of a string. String class is an immutable class in .Net. Immutable means once we set a value to string object later we can’t change that value. To learn more about string class and immutability nature of that you can click here.
Now we know Regex class represents a regular expression, to use that class in our program we need to create an instance of that so that we can find matches and to do more crazy stuff with our text inputs. To create an instance of Regex class we will use one of its constructor that will take regular expression pattern string as an argument.
>Regex regex = new Regex(@"\bimportant\b");

Methods of Regex

Here we will explain some most commonly used Regex class methods.
That particular method will return a Boolean value true or false that will represent whether or not regular expression specified in Regex instance will find the match in input text. True means that match founded and false means match not found.
That method finds all the matches based on regular expression specified and return the matches in form of MatchCollection object
That particular method replaces all the matches based on regular expression specified with a specific replacement string.

If you are interested in learning more about Regex class and want to explore all of its constructors, properties and methods you click here at MSDN.

Explaining Simple Expressions with Examples

1. “important” literally speaking it will find ‘important’ as it is

The regular expression pattern ‘important’ which is a very simple form of regular expression will find the match for 9 words ‘I’, ’m’, ’p’, ’o’, ’r’, ’t’, ’a’, ’n’, ’t’ in exact same sequence as they are written above. If there are some characters before and after the sequence other than space, inappropriately it will find those matches too like word like unimportant, very-important and importanttt etc.











As we saw in above example the weak point of our expression. Now let’s improve our expression so we can get what we actually want
2. “\bimportant\b” now it will find the ‘important’ as whole word.
Now we have improved our expression by adding ‘\b’ before and after that. As you are already familiar with ‘\b’ from above table. So ‘\b’ is a special character that tells regular expressions engine please start finding match for that particular expression at beginning of word and stops at the ending of word.In simple terms ‘\b’ represents the position at beginning and  ending of the word.

 class JustFind
    {
        static void Main(string[] args)
        {
            string pattern = @"\bimportant\b";
            string inputString = "Some important text to find unimportant stuff";

            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            MatchCollection matches = regex.Matches(inputString);

            Console.WriteLine("\tAll Matches");

            foreach (var match in matches)
            {
                Console.WriteLine(match);
            }

            Console.ReadLine();
        }
    }

Result:
You can see from above snapshot we got only one result back.
3. Example of ‘\s’ character.

Here we will explain an example where we will use ‘\s’ character to explain you the purpose and use of that particular character. As I mentioned above in special character’s table that ‘\s’ character is used to represent a white space character in text. So with use of ‘\s’ we will create a regular expression that will help us to replace spaces between words with ’_’ character.
class JustFind
    {
        static void Main(string[] args)
        {
            string pattern = @"\s";
            string inputString = "Some important text to find unimportant stuff";
            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            string result = regex.Replace(inputString, "_");
            Console.WriteLine(result);
            Console.ReadLine();
        }
    }
Result:







4.Finding Number of Words  ("\b\w+\b").
In this example we will write an expression that’s going to help us to find number of words in a particular text input. In any text or document  words are separated by space character, so in that case space character will help us to find our words. The above will skip the spaces b/w words and will pick up every word that starts and ends with any alphanumeric character and must have 1 or more characters inside.


\b
means starts with
\b\w
means starts with any alphanumeric character
\b\w+
means starts with any alphanumeric character and repeat the previous match 1 or more time (in simple terms it means the word we are going to match must contain at least one character)
\b\w +\b
At the end \b means word also must ends with an alphanumeric character


class JustFind
    {
        static void Main(string[] args)
        {
            string pattern = @"\b\w+\b";
            string inputString = "Some important text to find unimportant stuff";
            Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
            int totalWords = regex.Matches(inputString).Count;
            Console.WriteLine($"Words Count: {totalWords}");
            Console.ReadLine();
        }

    }
Result:




Comments

Popular posts from this blog

Understanding ASP.Net -Part 2- Building an Owin Pipeline

Understanding ASP.Net -Part3- Building Reusable and Configurable Middlewares

Understanding ASP.Net - Part1- Owin and Katana Introduction