C# ASP.NET SQL SERVER

Get link list from HTML page with regular expressions

Here's a little bit of code that I wrote to strip out the anchor links from an HTML page using regular expressions. The final line of the example code will only work if you're testing this in Joseph Albahari's excellent LINQPad. i.e. Dump() is an extension method that he's added to IEnumerable.

String text = @"<html>
                <head><title>Development Projects</title></head>
                <body>
                <ul>
                    <li><a href=""http://linttrap.domain.com"">linttrap</a></li>
                    <!--
                    <li><a href=""http://help.domain.com"">help</a></li>
                    -->
                    <li><a href=""http://help2.domain.com"">help2</a></li>
                    <li><a href=""http://help3.domain.com"">help3</a></li>
                    <li><a href=""http://gdhelp.domain.com"">gdhelp</a></li>
                    <li><a href=""http://helpadmin.domain.com"">help admin</a></li>
                </ul>
                </body>
            </html>";
Regex linkRegex = new Regex(" href=\"([^\"]*)\"");
List<String> links = new List<String>();
MatchCollection matches = linkRegex.Matches(text);
foreach (Match m in matches) {
    links.Add(m.Groups[1].Value);
}
links.Dump("Links");

Here's the output from the Dump() function:

▪ Links

5List<String>

http://linttrap.domain.com

http://help.domain.com
http://help2.domain.com
http://help3.domain.com
http://gdhelp.domain.com
http://helpadmin.domain.com

 

» Similar Posts

  1. ASP.NET MVC with jQuery DynaTree plugin for Checkboxes
  2. ASP.NET MVC DropDownList from Enum
  3. Optimizing CSS in ASP.NET MVC

» Trackbacks & Pingbacks

    No trackbacks yet.
Trackback link for this post:
http://guyellisrocks.com/trackback.ashx?id=53

» Comments

  1. Chetan avatar

    What if we have links in the head tag? We don't need those.

    Chetan — August 30, 2010 7:03 AM
  2. pispipepe avatar

    thanks!!

    thanks!!

    it's example little and functional!

    pispipepe — April 20, 2011 11:45 PM

» Leave a Comment