.Net Regular Expressions Demystified Part 1
In very simple terms we can say that
"regular expression is a group of
characters that defines a pattern" and using that pattern we find out
specific information that we required in our case
So the regular expressions are nothing but a
group of characters that have special meanings to regular expression engine
which is already installed in .Net framework and represented by System.Text.RegularExpressions.Regex .
Goal of Article
My goal in this article is to give you a basic
understanding of regular expressions in very short amount of time. I will guide
you that enough so that you can create and use your own regular expressions in
your .Net applications to meet your needs. At the end of this article you will
be able to create your own regular expressions to match a standard phone
number, social security number, email address, postal code etc.
Prerequisites
You must have a basic understanding of C#
Language and some basic concepts of OOP (object-oriented programming).
Let’s Jump into Regular Expressions
Before I start discussing the syntax and regular expression engine let me answer some questions that you might have in your mind.You might be confused like these guys, If you are confusing and thinking regular expressions are hard to learn then trust regular expressions are not that bad, in fact it's really very easy to learn and use them. Once you are up and running with regular expressions, I believe you will be doing more cool and fun stuff with regex and will be fully utilizing the power of .net regular expressions engine.
What Regular Expressions are?
As I told you earlier regular expressions are
nothing but a string of special characters defining a pattern
that further will be used in our c# program to extract out specific information
from a large block of text.
Why We Need Regular Expressions?
As a .Net developer we work on different types
of applications like web, mobile or desktop apps. Now there is one thing is
common in all type of application is taking input from users. Users might
intentionally or unintentionally put wrong input. Now it’s our duty as a
developer to validate that input before we process the information and store
that into database. Wrong inputs might crash our applications. So we need
regular expressions to validate that input.That was the only one reason but
there so many reasons why we use regular expression, it's because after all
regular expressions provide us a powerful and fast way to manipulate and parse
text. There is plus point for us because regular expressions
syntax is same for all types of .net applications.
Uses of Regular Expressions.
There are many practical uses of regular
expressions but let me tell you some common one’s
· Regular expressions can be used to
manipulate and validate user inputs.
· Regular expressions can be used to replace,
remove and pull out values from text input.
· You can use regular expressions to parse Html
document for taking out some specific data to store in database.
· Regular Expressions might use to find out
specific words or sentences in a large document instead reading the whole
document.
Let’s Start Learning and Practicing
The best way to learn anything in the world is
start practicing and getting your hand dirty into it before you completely
learn it and at the end there is always something more to learn.
How Regular Expressions Work?
Regular expressions used to process text-based
on regular expression engine that is already installed in .Net Framework and
represented by System.Text.RegularExpressions.Regex.
Regular expression engine needs only two
things to process texts.
1. The regular expression pattern that you
defined to find text. (Don’t worry later in this article we will learn the
syntax of regular expression).
2. And the second thing is the input text that we
need to parse.
Basic Syntax
Now it’s time learn the basic syntax of .Net
regular expressions so that we can create and use them in our C# programs.
Special Characters
As I told earlier regular expressions are
a group of special characters with special meanings.
Here are some mostly used special characters are listed below
in the table that I referenced from MSDN.
Special Characters
|
|
\b
|
Represents the position at the beginning and
ending of the word.
|
\d
|
Represents any digit character.
|
\t
|
Represents a backspace character.
|
\n
|
Represent new line character.
|
\s
|
Represents any white space character.
|
.
|
Represents every character on same line.
|
\w
|
Represents any non-digit alphanumeric
character.
|
^
|
Matches position at beginning of whole
string.
|
$
|
Represents position at the end of whole
string.
|
Before we see some more special characters let me explain you some simple examples where we will use above characters so that you can feel more friendly. Before all of that you need a basic understanding of Regex class and it methods.
Understanding Regex Class
Regex is a standard C# .Net class that used to
represent regular expressions in .Net. We can easily say that Regex is used to
represent an immutable regular expression, it’s because later we will see that
Regex actually accepts a regular expression value in form of a string. String
class is an immutable class in .Net. Immutable means once we set a value to
string object later we can’t change that value. To learn more about string
class and immutability nature of that you can click here.
Now we know Regex class represents a regular expression, to use that class in our program we need to create an instance of that so that we can find matches and to do more crazy stuff with our text inputs. To create an instance of Regex class we will use one of its constructor that will take regular expression pattern string as an argument.
Now we know Regex class represents a regular expression, to use that class in our program we need to create an instance of that so that we can find matches and to do more crazy stuff with our text inputs. To create an instance of Regex class we will use one of its constructor that will take regular expression pattern string as an argument.
>Regex regex = new Regex(@"\bimportant\b");
Methods of Regex
Here we will explain some most commonly used Regex class
methods.
That particular method will return a Boolean
value true or false that will represent whether or not regular expression
specified in Regex instance will find the match in input text. True means
that match founded and false means match not found.
|
|
That method finds all the matches based on
regular expression specified and return the matches in form of
MatchCollection object
|
|
That particular method replaces all the
matches based on regular expression specified with a specific replacement
string.
|
If you are interested in learning more about Regex class and want to explore all of its constructors, properties and methods you click here at MSDN.
Explaining Simple Expressions with Examples
1. “important” literally speaking it will find ‘important’ as it is
The regular expression
pattern ‘important’ which is a very simple form of regular expression will find
the match for 9 words ‘I’, ’m’, ’p’, ’o’, ’r’, ’t’, ’a’, ’n’, ’t’ in exact same
sequence as they are written above. If there are some characters before and
after the sequence other than space, inappropriately it will find those
matches too like word like unimportant, very-important and importanttt etc.
As we saw in above example the weak point of
our expression. Now let’s improve our expression so we can get what we actually
want
2. “\bimportant\b” now it will find the
‘important’ as whole word.
Now we have improved our
expression by adding ‘\b’ before and after that. As you are already familiar
with ‘\b’ from above table. So ‘\b’ is a special character that tells regular
expressions engine please start finding match for that particular expression at
beginning of word and stops at the ending of word.In simple terms ‘\b’
represents the position at beginning and ending of the word.
class JustFind
class JustFind
{
static void Main(string[] args)
{
string pattern = @"\bimportant\b";
string inputString = "Some important text
to find unimportant stuff";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(inputString);
Console.WriteLine("\tAll Matches");
foreach (var match in matches)
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}
Result:
Result:
You can see from above snapshot we got only
one result back.
3. Example of ‘\s’ character.
Here we will explain an example where we will
use ‘\s’ character to explain you the purpose and use of that particular
character. As I mentioned above in special character’s table that ‘\s’
character is used to represent a white space character in text. So with use of
‘\s’ we will create a regular expression that will help us to replace spaces
between words with ’_’ character.
class JustFind
{
static void Main(string[] args)
{
string pattern = @"\s";
string inputString = "Some important text
to find unimportant stuff";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
string result = regex.Replace(inputString, "_");
Console.WriteLine(result);
Console.ReadLine();
}
}
Result:
4.Finding Number of Words ("\b\w+\b").
In this example we will write an expression that’s going to help
us to find number of words in a particular text input. In any text or
document words are separated by space character, so in that case space
character will help us to find our words. The above will skip the spaces b/w
words and will pick up every word that starts and ends with any alphanumeric
character and must have 1 or more characters inside.
\b
|
means starts with
|
\b\w
|
means starts with any alphanumeric character
|
\b\w+
|
means starts with any alphanumeric character
and repeat the previous match 1 or more time (in simple terms it means the
word we are going to match must contain at least one character)
|
\b\w +\b
|
At the end \b means word also must ends with
an alphanumeric character
|
class JustFind
{
static void Main(string[] args)
{
string pattern = @"\b\w+\b";
string inputString = "Some important text
to find unimportant stuff";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
int totalWords = regex.Matches(inputString).Count;
Console.WriteLine($"Words Count: {totalWords}");
Console.ReadLine();
}
}
Result:
Comments
Post a Comment