Remove HTML tags from string using regular expression - C#

HTML Tags:


Html has very simple and logical format. Html is a simple text and by using tags we can format this text, for example if we want bigger text, smaller text, bold text, underlined text etc. We use html tags for this purpose.






A less than sign '<' a word or character and greater than sign '>'


For example <p>, <title>


Remove HTML tags from string using regular expressions:



You can remove HTML tags from the strings. We remove these tags from the string to get plain text we just remove tags without changing in text.



To understand let us consider an example in which we have a string with HTML tags and text. To remove that tags we use namespace System.Text.RegularExpressions and use its class Regex which match the tags. And if the tags are matched then will we replace it by an empty string. To replace we use the ‘Replace’ method to replace the tags. In this way we can remove HTML tags from string using regular expressions.



To demonstrate make a console application and write the following code.







staticvoid Main(string[] args)


// sample text with tags

stringstr = "<html><head>I am a boy </head><body>I read in University</body></html>";


// regex which match tags

System.Text.RegularExpressions.Regexrx =newSystem.Text.RegularExpressions.Regex("<[^>]*>");


// replace all matches with empty string

str = rx.Replace(str, "");





