HTML to PDF (or Bitmap, JPEG, etc.) the Cheap Way

9/16/2008

I couple weeks ago (might have been last week, my memory is terrible), I was looking at Rick Strahl's blog (good read by the way). It was specifically this item. I wont get into whether I agree or disagree with anything there (even though I did leave a comment). And of course various groups picked up the post and wrote about how ASP.Net isn't "free" (or is it FREE... I prefer "free") in the comments section, how MS is evil, blah, blah, blah. None of that interested me (I've outgrown most of that with the exception of OSX, I simply can't find a reason to like it... I blame my years of Ubuntu/Red Hat and Windows use for my bias in that regard...) Anyway, what did interest me was a comment where someone said ASP.Net sucks because you have to spend a ton of money for a HTML to PDF library. I'm here to tell you that you don't have to spend a dime (mostly thanks to MS and also iTextSharp).

I love iTextSharp and use it quite often. The only issue I had was creating PDFs of websites.  However I can add text, format it, add images (keep that in mind for later), etc. with little to no effort. And if you've used iText at all, it's simply a port of it to .Net (Yay ports). Anyway, I really like it as a tool. But once again, can't automatically get PDF versions of websites... Ok, I lied, you can, and here's how: create an image of the website.

"How can I create an image of the website?" You may be asking. Simple,use the WebBrowser class. The WebBrowser class in C# is a form object that acts as a web browser, minus some plugin support such as Flash and Java by default. I know, you're shocked that the item does what it's called, but one of the nice abilities of the WebBrowser class is a nice function called DrawToBitmap. This little function can be used to capture what is currently in the browser (or part of it) and export it to a Bitmap object. So let's see what we need to do to use that function to get our image.

   1: /*
   2: Copyright (c) 2010 <a href="http://www.gutgames.com">James Craig</a>
   3: 
   4: Permission is hereby granted, free of charge, to any person obtaining a copy
   5: of this software and associated documentation files (the "Software"), to deal
   6: in the Software without restriction, including without limitation the rights
   7: to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
   8: copies of the Software, and to permit persons to whom the Software is
   9: furnished to do so, subject to the following conditions:
  10: 
  11: The above copyright notice and this permission notice shall be included in
  12: all copies or substantial portions of the Software.
  13: 
  14: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  15: IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  16: FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  17: AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  18: LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  19: OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
  20: THE SOFTWARE.*/
  21:  
  22: #region Usings
  23: using System;
  24: using System.Drawing;
  25: using System.Drawing.Imaging;
  26: using System.Threading;
  27: using System.Windows.Forms;
  28: #endregion
  29:  
  30: namespace Utilities.Web.WebPageThumbnail
  31: {
  32:     /// <summary>
  33:     /// Class for taking a screen shot of a web page
  34:     /// </summary>
  35:     public class WebPageThumbnail
  36:     {
  37:         #region Constructor
  38:         /// <summary>
  39:         /// Constructor
  40:         /// </summary>
  41:         public WebPageThumbnail()
  42:         {
  43:         }
  44:         #endregion
  45:  
  46:         #region Private Variables
  47:         private string FileName;
  48:         private string Url;
  49:         private int Width;
  50:         private int Height;
  51:         #endregion
  52:  
  53:         #region Public Functions
  54:  
  55:         /// <summary>
  56:         /// Generates a screen shot of a web site
  57:         /// </summary>
  58:         /// <param name="FileName">File name to save as</param>
  59:         /// <param name="Url">Url to take the screen shot of</param>
  60:         /// <param name="Width">Width of the image (-1 for full size)</param>
  61:         /// <param name="Height">Height of the image (-1 for full size)</param>
  62:         public void GenerateBitmap(string FileName, string Url, int Width, int Height)
  63:         {
  64:             this.Url = Url;
  65:             this.FileName = FileName;
  66:             this.Width = Width;
  67:             this.Height = Height;
  68:             Thread TempThread = new Thread(new ThreadStart(CreateBrowser));
  69:             TempThread.SetApartmentState(ApartmentState.STA);
  70:             TempThread.Start();
  71:             TempThread.Join();
  72:         }
  73:  
  74:         #endregion
  75:  
  76:         #region Private Functions
  77:  
  78:         /// <summary>
  79:         /// Creates the browser
  80:         /// </summary>
  81:         private void CreateBrowser()
  82:         {
  83:             using (WebBrowser Browser = new WebBrowser())
  84:             {
  85:                 Browser.ScrollBarsEnabled = false;
  86:                 DateTime TimeoutStart = DateTime.Now;
  87:                 TimeSpan Timeout = new TimeSpan(0, 0, 10);
  88:                 Browser.Navigate(Url);
  89:                 Browser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(Browser_DocumentCompleted);
  90:                 while (Browser.ReadyState != WebBrowserReadyState.Complete)
  91:                 {
  92:                     if (DateTime.Now - TimeoutStart > Timeout)
  93:                         break;
  94:                     Application.DoEvents();
  95:                 }
  96:             }
  97:         }
  98:  
  99:         /// <summary>
 100:         /// Called when the browser is completed
 101:         /// </summary>
 102:         /// <param name="sender"></param>
 103:         /// <param name="e"></param>
 104:         void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
 105:         {
 106:             WebBrowser Browser = (WebBrowser)sender;
 107:             Browser.ScriptErrorsSuppressed = true;
 108:             Browser.ScrollBarsEnabled = false;
 109:             if (Width == -1)
 110:             {
 111:                 Browser.Width = Browser.Document.Body.ScrollRectangle.Width;
 112:             }
 113:             else
 114:             {
 115:                 Browser.Width = Width;
 116:             }
 117:             if (Height == -1)
 118:             {
 119:                 Browser.Height = Browser.Document.Body.ScrollRectangle.Height;
 120:             }
 121:             else
 122:             {
 123:                 Browser.Height = Height;
 124:             }
 125:             using (Bitmap Image = new Bitmap(Browser.Width, Browser.Height))
 126:             {
 127:                 Browser.BringToFront();
 128:                 Browser.DrawToBitmap(Image, new Rectangle(0, 0, Browser.Width, Browser.Height));
 129:                 Image.Save(FileName, ImageFormat.Bmp);
 130:             }
 131:         }
 132:  
 133:         #endregion
 134:     }
 135: }

You may have noticed that the above code does something a bit odd... It creates a thread. There's a reason for this, namely the WebBrowser object has to be in a thread which is STA (Single Threaded Apartment). Normal threads in a ASP.Net web site are not set to STA and as such we need to create a secondary thread that is. So basically the code creates the thread, the thread creates the web browser and waits for it to finish loading the page, in turn when the web browser is done it calls our DocumentCompleted function which saves the Bitmap version of the file. That's all there is to it. If you set the width to -1 it will take the width of the website and if you set the height to -1, it will take the height of the website (in otherwords it captures everything). However you can set the width/height of the output and it will simply pull that information (for instance if you want to see what is above the fold on an 640x480 screen, you would set width = 640 and height = 480).

That's all there is to the code really, but it doesn't exactly get us to the PDF. That's where iTextSharp comes in. To be honest, I'm not going to go over that part in detail (as there are tutorials on how to add an image to a PDF using iTextSharp). However, all you need to do is create a new PDF, add the image to it, output it to a file (or straight into the response stream). But with those two items, you have yourself a free HTML to PDF creator... Anyway, try it out, leave feedback, and happy coding.



Comments

James Craig
February 10, 2010 5:53 PM

Someone pointed out to me that the Thread.Join function actually doesn't work on an IIS server. If you run into that issue, you can simply ignore the Join call. All it does is waits for the thread to end.