Chris Benard

Extract Text from RTF in C#/.Net

August 20, 2014 Chris BenardPersonal

At work, I was tasked with creating a class to strip RTF tags from RTF formatted text, leaving only the plain text. Microsoft’s RichTextBox can do this with its Text property, but it was unavailable in the context in which I’m working.

RTF formatting uses control characters escaped with backslashes along with nested curly braces. Unfortunately, the nesting means I can’t kill the control characters using a single regex, since I’d have to process the stack, and in addition, some control characters should be translated, such as newline and tab characters.

Example:

{\rtf1\ansi\deff0
{\colortbl;\red0\green0\blue0;\red255\green0\blue0;}
This line is the default color\line
\cf2
This line is red\line
\cf1
This line is the default color
}

Thankfully, Markus Jarderot provided a great answer over at StackOverflow, but unfortunately for me, it’s written in Python. I don’t know Python, but I translated it to the best of my abilities to C# since it was very readable.

If this is useful to you, you can download the C# version, or view the original/new code below.

The code in this post is licensed Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0), as is all code on Stack Overflow.

View Original Python Code

def striprtf(text):
   pattern = re.compile(r"\\([a-z]{1,32})(-?\d{1,10})?[ ]?|\\'([0-9a-f]{2})|\\([^a-z])|([{}])|[\r\n]+|(.)", re.I)
   # control words which specify a "destionation".
   destinations = frozenset((
      'aftncn','aftnsep','aftnsepc','annotation','atnauthor','atndate','atnicn','atnid',
      'atnparent','atnref','atntime','atrfend','atrfstart','author','background',
      'bkmkend','bkmkstart','blipuid','buptim','category','colorschememapping',
      'colortbl','comment','company','creatim','datafield','datastore','defchp','defpap',
      'do','doccomm','docvar','dptxbxtext','ebcend','ebcstart','factoidname','falt',
      'fchars','ffdeftext','ffentrymcr','ffexitmcr','ffformat','ffhelptext','ffl',
      'ffname','ffstattext','field','file','filetbl','fldinst','fldrslt','fldtype',
      'fname','fontemb','fontfile','fonttbl','footer','footerf','footerl','footerr',
      'footnote','formfield','ftncn','ftnsep','ftnsepc','g','generator','gridtbl',
      'header','headerf','headerl','headerr','hl','hlfr','hlinkbase','hlloc','hlsrc',
      'hsv','htmltag','info','keycode','keywords','latentstyles','lchars','levelnumbers',
      'leveltext','lfolevel','linkval','list','listlevel','listname','listoverride',
      'listoverridetable','listpicture','liststylename','listtable','listtext',
      'lsdlockedexcept','macc','maccPr','mailmerge','maln','malnScr','manager','margPr',
      'mbar','mbarPr','mbaseJc','mbegChr','mborderBox','mborderBoxPr','mbox','mboxPr',
      'mchr','mcount','mctrlPr','md','mdeg','mdegHide','mden','mdiff','mdPr','me',
      'mendChr','meqArr','meqArrPr','mf','mfName','mfPr','mfunc','mfuncPr','mgroupChr',
      'mgroupChrPr','mgrow','mhideBot','mhideLeft','mhideRight','mhideTop','mhtmltag',
      'mlim','mlimloc','mlimlow','mlimlowPr','mlimupp','mlimuppPr','mm','mmaddfieldname',
      'mmath','mmathPict','mmathPr','mmaxdist','mmc','mmcJc','mmconnectstr',
      'mmconnectstrdata','mmcPr','mmcs','mmdatasource','mmheadersource','mmmailsubject',
      'mmodso','mmodsofilter','mmodsofldmpdata','mmodsomappedname','mmodsoname',
      'mmodsorecipdata','mmodsosort','mmodsosrc','mmodsotable','mmodsoudl',
      'mmodsoudldata','mmodsouniquetag','mmPr','mmquery','mmr','mnary','mnaryPr',
      'mnoBreak','mnum','mobjDist','moMath','moMathPara','moMathParaPr','mopEmu',
      'mphant','mphantPr','mplcHide','mpos','mr','mrad','mradPr','mrPr','msepChr',
      'mshow','mshp','msPre','msPrePr','msSub','msSubPr','msSubSup','msSubSupPr','msSup',
      'msSupPr','mstrikeBLTR','mstrikeH','mstrikeTLBR','mstrikeV','msub','msubHide',
      'msup','msupHide','mtransp','mtype','mvertJc','mvfmf','mvfml','mvtof','mvtol',
      'mzeroAsc','mzeroDesc','mzeroWid','nesttableprops','nextfile','nonesttables',
      'objalias','objclass','objdata','object','objname','objsect','objtime','oldcprops',
      'oldpprops','oldsprops','oldtprops','oleclsid','operator','panose','password',
      'passwordhash','pgp','pgptbl','picprop','pict','pn','pnseclvl','pntext','pntxta',
      'pntxtb','printim','private','propname','protend','protstart','protusertbl','pxe',
      'result','revtbl','revtim','rsidtbl','rxe','shp','shpgrp','shpinst',
      'shppict','shprslt','shptxt','sn','sp','staticval','stylesheet','subject','sv',
      'svb','tc','template','themedata','title','txe','ud','upr','userprops',
      'wgrffmtfilter','windowcaption','writereservation','writereservhash','xe','xform',
      'xmlattrname','xmlattrvalue','xmlclose','xmlname','xmlnstbl',
      'xmlopen',
   ))
   # Translation of some special characters.
   specialchars = {
      'par': '\n',
      'sect': '\n\n',
      'page': '\n\n',
      'line': '\n',
      'tab': '\t',
      'emdash': u'\u2014',
      'endash': u'\u2013',
      'emspace': u'\u2003',
      'enspace': u'\u2002',
      'qmspace': u'\u2005',
      'bullet': u'\u2022',
      'lquote': u'\u2018',
      'rquote': u'\u2019',
      'ldblquote': u'\201C',
      'rdblquote': u'\u201D', 
   }
   stack = []
   ignorable = False       # Whether this group (and all inside it) are "ignorable".
   ucskip = 1              # Number of ASCII characters to skip after a unicode character.
   curskip = 0             # Number of ASCII characters left to skip
   out = []                # Output buffer.
   for match in pattern.finditer(text):
      word,arg,hex,char,brace,tchar = match.groups()
      if brace:
         curskip = 0
         if brace == '{':
            # Push state
            stack.append((ucskip,ignorable))
         elif brace == '}':
            # Pop state
            ucskip,ignorable = stack.pop()
      elif char: # \x (not a letter)
         curskip = 0
         if char == '~':
            if not ignorable:
                out.append(u'\xA0')
         elif char in '{}\\':
            if not ignorable:
               out.append(char)
         elif char == '*':
            ignorable = True
      elif word: # \foo
         curskip = 0
         if word in destinations:
            ignorable = True
         elif ignorable:
            pass
         elif word in specialchars:
            out.append(specialchars[word])
         elif word == 'uc':
            ucskip = int(arg)
         elif word == 'u':
            c = int(arg)
            if c < 0: c += 0x10000
            if c > 127: out.append(unichr(c))
            else: out.append(chr(c))
            curskip = ucskip
      elif hex: # \'xx
         if curskip > 0:
            curskip -= 1
         elif not ignorable:
            c = int(hex,16)
            if c > 127: out.append(unichr(c))
            else: out.append(chr(c))
      elif tchar:
         if curskip > 0:
            curskip -= 1
         elif not ignorable:
            out.append(tchar)
   return ''.join(out)

View Translated C# Code

/// <summary>
/// Rich Text Stripper
/// </summary>
/// <remarks>
/// Translated from Python located at:
/// http://stackoverflow.com/a/188877/448
/// </remarks>
public static class RichTextStripper
{
    private class StackEntry
    {
        public int NumberOfCharactersToSkip { get; set; }
        public bool Ignorable { get; set; }

        public StackEntry(int numberOfCharactersToSkip, bool ignorable)
        {
            NumberOfCharactersToSkip = numberOfCharactersToSkip;
            Ignorable = ignorable;
        }
    }

    private static readonly Regex _rtfRegex = new Regex(@"\\([a-z]{1,32})(-?\d{1,10})?[ ]?|\\'([0-9a-f]{2})|\\([^a-z])|([{}])|[\r\n]+|(.)", RegexOptions.Singleline | RegexOptions.IgnoreCase);

    private static readonly List<string> destinations = new List<string>
    {
        "aftncn","aftnsep","aftnsepc","annotation","atnauthor","atndate","atnicn","atnid",
        "atnparent","atnref","atntime","atrfend","atrfstart","author","background",
        "bkmkend","bkmkstart","blipuid","buptim","category","colorschememapping",
        "colortbl","comment","company","creatim","datafield","datastore","defchp","defpap",
        "do","doccomm","docvar","dptxbxtext","ebcend","ebcstart","factoidname","falt",
        "fchars","ffdeftext","ffentrymcr","ffexitmcr","ffformat","ffhelptext","ffl",
        "ffname","ffstattext","field","file","filetbl","fldinst","fldrslt","fldtype",
        "fname","fontemb","fontfile","fonttbl","footer","footerf","footerl","footerr",
        "footnote","formfield","ftncn","ftnsep","ftnsepc","g","generator","gridtbl",
        "header","headerf","headerl","headerr","hl","hlfr","hlinkbase","hlloc","hlsrc",
        "hsv","htmltag","info","keycode","keywords","latentstyles","lchars","levelnumbers",
        "leveltext","lfolevel","linkval","list","listlevel","listname","listoverride",
        "listoverridetable","listpicture","liststylename","listtable","listtext",
        "lsdlockedexcept","macc","maccPr","mailmerge","maln","malnScr","manager","margPr",
        "mbar","mbarPr","mbaseJc","mbegChr","mborderBox","mborderBoxPr","mbox","mboxPr",
        "mchr","mcount","mctrlPr","md","mdeg","mdegHide","mden","mdiff","mdPr","me",
        "mendChr","meqArr","meqArrPr","mf","mfName","mfPr","mfunc","mfuncPr","mgroupChr",
        "mgroupChrPr","mgrow","mhideBot","mhideLeft","mhideRight","mhideTop","mhtmltag",
        "mlim","mlimloc","mlimlow","mlimlowPr","mlimupp","mlimuppPr","mm","mmaddfieldname",
        "mmath","mmathPict","mmathPr","mmaxdist","mmc","mmcJc","mmconnectstr",
        "mmconnectstrdata","mmcPr","mmcs","mmdatasource","mmheadersource","mmmailsubject",
        "mmodso","mmodsofilter","mmodsofldmpdata","mmodsomappedname","mmodsoname",
        "mmodsorecipdata","mmodsosort","mmodsosrc","mmodsotable","mmodsoudl",
        "mmodsoudldata","mmodsouniquetag","mmPr","mmquery","mmr","mnary","mnaryPr",
        "mnoBreak","mnum","mobjDist","moMath","moMathPara","moMathParaPr","mopEmu",
        "mphant","mphantPr","mplcHide","mpos","mr","mrad","mradPr","mrPr","msepChr",
        "mshow","mshp","msPre","msPrePr","msSub","msSubPr","msSubSup","msSubSupPr","msSup",
        "msSupPr","mstrikeBLTR","mstrikeH","mstrikeTLBR","mstrikeV","msub","msubHide",
        "msup","msupHide","mtransp","mtype","mvertJc","mvfmf","mvfml","mvtof","mvtol",
        "mzeroAsc","mzeroDesc","mzeroWid","nesttableprops","nextfile","nonesttables",
        "objalias","objclass","objdata","object","objname","objsect","objtime","oldcprops",
        "oldpprops","oldsprops","oldtprops","oleclsid","operator","panose","password",
        "passwordhash","pgp","pgptbl","picprop","pict","pn","pnseclvl","pntext","pntxta",
        "pntxtb","printim","private","propname","protend","protstart","protusertbl","pxe",
        "result","revtbl","revtim","rsidtbl","rxe","shp","shpgrp","shpinst",
        "shppict","shprslt","shptxt","sn","sp","staticval","stylesheet","subject","sv",
        "svb","tc","template","themedata","title","txe","ud","upr","userprops",
        "wgrffmtfilter","windowcaption","writereservation","writereservhash","xe","xform",
        "xmlattrname","xmlattrvalue","xmlclose","xmlname","xmlnstbl",
        "xmlopen"
    };

    private static readonly Dictionary<string, string> specialCharacters = new Dictionary<string, string>
    {
        { "par", "\n" },
        { "sect", "\n\n" },
        { "page", "\n\n" },
        { "line", "\n" },
        { "tab", "\t" },
        { "emdash", "\u2014" },
        { "endash", "\u2013" },
        { "emspace", "\u2003" },
        { "enspace", "\u2002" },
        { "qmspace", "\u2005" },
        { "bullet", "\u2022" },
        { "lquote", "\u2018" },
        { "rquote", "\u2019" },
        { "ldblquote", "\u201C" },
        { "rdblquote", "\u201D" },
    };
    /// <summary>
    /// Strip RTF Tags from RTF Text
    /// </summary>
    /// <param name="inputRtf">RTF formatted text</param>
    /// <returns>Plain text from RTF</returns>
    public static string StripRichTextFormat(string inputRtf)
    {
        if (inputRtf == null)
        {
            return null;
        }

        string returnString;

        var stack = new Stack<StackEntry>();
        bool ignorable = false;              // Whether this group (and all inside it) are "ignorable".
        int ucskip = 1;                      // Number of ASCII characters to skip after a unicode character.
        int curskip = 0;                     // Number of ASCII characters left to skip
        var outList = new List<string>();    // Output buffer.

        Match match = _rtfRegex.Match(inputRtf);

        if (!match.Success)
        {
            // Didn't match the regex
            return inputRtf;
        }

        while (match.Success)
        {
            string word = match.Groups[1].Value;
            string arg = match.Groups[2].Value;
            string hex = match.Groups[3].Value;
            string character = match.Groups[4].Value;
            string brace = match.Groups[5].Value;
            string tchar = match.Groups[6].Value;

            if (!String.IsNullOrEmpty(brace))
            {
                curskip = 0;
                if (brace == "{")
                {
                    // Push state
                    stack.Push(new StackEntry(ucskip, ignorable));
                }
                else if (brace == "}")
                {
                    // Pop state
                    StackEntry entry = stack.Pop();
                    ucskip = entry.NumberOfCharactersToSkip;
                    ignorable = entry.Ignorable;
                }
            }
            else if (!String.IsNullOrEmpty(character)) // \x (not a letter)
            {
                curskip = 0;
                if (character == "~")
                {
                    if (!ignorable)
                    {
                        outList.Add("\xA0");
                    }
                }
                else if ("{}\\".Contains(character))
                {
                    if (!ignorable)
                    {
                        outList.Add(character);
                    }
                }
                else if (character == "*")
                {
                    ignorable = true;
                }
            }
            else if (!String.IsNullOrEmpty(word)) // \foo
            {
                curskip = 0;
                if (destinations.Contains(word))
                {
                    ignorable = true;
                }
                else if (ignorable)
                {
                }
                else if (specialCharacters.ContainsKey(word))
                {
                    outList.Add(specialCharacters[word]);
                }
                else if (word == "uc")
                {
                    ucskip = Int32.Parse(arg);
                }
                else if (word == "u")
                {
                    int c = Int32.Parse(arg);
                    if (c < 0)
                    {
                        c += 0x10000;
                    }
                    outList.Add(Char.ConvertFromUtf32(c));
                    curskip = ucskip;
                }
            }
            else if (!String.IsNullOrEmpty(hex)) // \'xx
            {
                if (curskip > 0)
                {
                    curskip -= 1;
                }
                else if (!ignorable)
                {
                    int c = Int32.Parse(hex, System.Globalization.NumberStyles.HexNumber);
                    outList.Add(Char.ConvertFromUtf32(c));
                }
            }
            else if (!String.IsNullOrEmpty(tchar))
            {
                if (curskip > 0)
                {
                    curskip -= 1;
                }
                else if (!ignorable)
                {
                    outList.Add(tchar);
                }
            }

            // Get the next match
            match = match.NextMatch();
        }

        returnString = String.Join(String.Empty, outList.ToArray());

        return returnString;
    }
}

Update: Johnny Lie pointed out some important performance improvements that I have incorporated. Instead of loading all the regex matches, it iterates through them one by one now. This allows larger regex to be processed successfully. Additionally, I have clarified the code license as CC BY-SA 3.0, due to the origin code coming from Stack Overflow, thanks to a comment by Spencer Schneidenbach.

Conditional Proxying In Chrome Like FoxyProxy

April 1, 2014 Chris BenardPersonal
This article assumes you already know how to set up a SOCKS proxy, likely via SSH using PuTTY.

Lots of people use FoxyProxy in Firefox to selectively proxy based on rules. FoxyProxy for Chrome exists now, which uses the Chrome Proxy API. However, this still leaks DNS via pre-fetch queries in other places in the Chrome browser (and possibly via other extensions).

If you want to force Chrome to use a conditional proxy and stop DNS leaks, you can use the --host-resolver-rules switch with a series of rules. You can either use FoxyProxy for Chrome if you trust it, or pass your own PAC (Proxy Auto Configuration) file, which is just a simple javascript function.

Create the PAC

Assume you save this file as C:\mypac.pac and you’ve set up a SOCKS5 proxy at localhost:8000.
```
function FindProxyForURL(url, host)
{
    // The "(.*\.)?" pattern ensures it matches the
    //   top level and sub-domains.
    if (/^(.*\.)?nonproxieddomain1\.com$/i.test(host) ||
        /^(.*\.)?nonproxieddomain2\.com$/i.test(host) ||
        /^localhost$/i.test(host)) {
        // Do not proxy
        return "DIRECT";
    }
    else {
        // Go through proxy
        return "SOCKS5 localhost:1080";
    }
}
```
Host Resolver Rules

Chrome has a command line parameter called --host-resolver-rules. This parameter allows you stop DNS leaks as mentioned above. You use the MAP command to map all addresses to ~NOTFOUND except for addresses you EXCLUDE.

You must keep this in lockstep with the FindProxyForURL function from your PAC. The reason for this is that you are telling Chrome to use your proxy for name resolution, for all but the regex matched domains. If a regex matches, then it will attempt to use DIRECT, meaning regular machine name resolution. If the regex matches, and you haven’t added an EXCLUDE entry, you will get a “domain not found” or similar name resolution error in Chrome when you try to reach the site.

It’s worth mentioning that the EXCLUDE entries in the resolver rules do not use regex and instead just use a wild card syntax, so you will need to duplicate each domain (once with wild card and once without) to match the top level and sub-domains.

Now that you understand both options and you have created your PAC file, you can now close Chrome and re-run it with new options!

Run Chrome With New Options

You will probably want to change your Chrome shortcut to the following:
```
"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --proxy-pac-url="file:///c:/mypac.pac" --host-resolver-rules="MAP * ~NOTFOUND,EXCLUDE localhost,EXCLUDE nonproxieddomain1.com,EXCLUDE *.nonproxieddomain1.com,EXCLUDE nonproxieddomain2.com,EXCLUDE *.nonproxieddomain2.com"
```
You can now check your IP at a site like IPChicken or by simply Googling “IP”. You can play with excluding those sites from the proxy (but be sure to add them to the EXCLUDE list) and re-checking. One thing to note: Chrome does not pick up PAC changes immediately. You need to go to chrome://net-internals/#proxy and click “Re-apply settings”. You can also clear the DNS cache at chrome://net-internals/#dns.

Prove You Are Completely Proxied

You can prove that you are proxied and your DNS is not leaking by running the following:
```
ipconfig /flushdns
ipconfig /displaydns | find /i "example.org"
```
Then, visit a site which is not excluded and is proxied in Chrome. Then run the following again (do not re-run ipconfig /flushdns)
```
ipconfig /displaydns | find /i "example.org"
```
You should not see any entries for “example.org”. If you repeat the process for a site which is EXCLUDEd and sent DIRECT instead of proxied, you should see it listed in the /displaydns output.

Interesting Threading Issue In .Net

February 28, 2014 Chris BenardPersonal

Yesterday I noticed a strange anomaly in the logs of an application I wrote and manage at work while investigating another issue. It manifested as us sending duplicate messages from a queue to a third party over and over.

I looked into the code and since threading was involved, I figured there was some thread safety/shared state issue involved. I consulted with a couple of coworkers who didn’t immediately notice any issues, but I created some simple test cases to test my assumptions about threading with lambdas.

Simple Example of the problem

Below you will see me attempting to create 5 workers, do a small unit of work, and then write to the console. I would expect each worker to write out the Int32 it was created with. My assumption was that the C# compiler would, with the lambda expression, create a copy of i for the thread being created and use that copy. I was entirely wrong.

As you can see, when the worker is created, on the initial thread, each Console.WriteLine has the right value of i, but when the thread is running, it contains the last value of i, 6 (for loop increments it after its last value value causing the loop to exit).

var rand = new Random();
var threads = new List<Thread>();

for (int i = 1; i <= 5; i++)
{
    Console.WriteLine("Creating worker {0}.", i);
    
    Thread t = new Thread(() =>
    {
        // Simulate work
        Thread.Sleep(rand.Next(500, 2000));
        
        Console.WriteLine("Finished running worker {0}.", i);
    });
    threads.Add(t);
}

threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

/* Output:
Creating worker 1.
Creating worker 2.
Creating worker 3.
Creating worker 4.
Creating worker 5.
Finished running worker 6.
Finished running worker 6.
Finished running worker 6.
Finished running worker 6.
Finished running worker 6. */

Simple Fix

The C# compiler did not make a copy of the state, but we can do this directly and pass it in using ParameterizedThreadStart. This makes the list a collection of Int32/Thread pairs. Obviously, in our actual app, our state object is more complex than an Int32.

var rand = new Random();
var threads = new List<Tuple<int, Thread>>();

for (int i = 1; i <= 5; i++)
{
    Console.WriteLine("Creating worker {0}.", i);
    
    Thread t = new Thread(x =>
    {
        // Simulate work
        Thread.Sleep(rand.Next(500, 2000));
        
        Console.WriteLine("Finished running worker {0}.", x);
    });
    threads.Add(Tuple.Create(i, t));
}

threads.ForEach(tuple => tuple.Item2.Start(tuple.Item1));
threads.ForEach(tuple => tuple.Item2.Join());

/* Output:
Creating worker 1.
Creating worker 2.
Creating worker 3.
Creating worker 4.
Creating worker 5.
Finished running worker 1.
Finished running worker 4.
Finished running worker 2.
Finished running worker 3.
Finished running worker 5. */

Better Fix

That works, but it doesn’t really reflect the original intent. I receive X number of things to do, and in the real product, a Semaphore was used to control the maximum number of messages that were sent at a time.

For instance, if I received 200 messages from the queue to send and I can send 50 messages at a time, I would spin up 200 threads which would wait on a semaphore, sending 50 maximum at a time. Obviously this is inefficient, and I don’t really have an excuse for why I did it this way when I converted it from a single-threaded process that could not keep up with demand to a multi-threaded process which ended up with this duplication problem. In retrospect, I would never have done this.

The following has a queue of 15 work items which is serviced by 5 worker threads and represents close to how the code works now.

Queue With Workers Serving It

var rand = new Random();
var threads = new List<Thread>();
var queueLocker = new object();
var queue = new Queue<int>();
const short maxWorkers = 5;

// Create dummy data for processing
for (int job = 1; job <= 15; job++)
{
    queue.Enqueue(job);
}

for (int i = 1; i <= maxWorkers; i++)
{
    Console.WriteLine("Creating worker {0}.", i);
    
    Thread t = new Thread(() =>
    {
        int? job = null;
        
        // Try to get job from queue to handle
        while (queue.Count > 0)
        {
            lock (queueLocker)
            {
                if (queue.Count > 0)
                {
                    job = queue.Dequeue();
                }
            }
            
            if (job.HasValue)
            {
                // Simulate work
                Thread.Sleep(rand.Next(500, 2000));
                
                Console.WriteLine(
                    "Worker {0} finished running job {1}.",
                    Thread.CurrentThread.Name,
                    job);
            }
        }

        Console.WriteLine(
            "Worker {0} has no more work. Exiting.",
            Thread.CurrentThread.Name);
    });
    t.Name = i.ToString();
    threads.Add(t);
}

threads.ForEach(t => t.Start());
threads.ForEach(t => t.Join());

/* Output:
Creating worker 1.
Creating worker 2.
Creating worker 3.
Creating worker 4.
Creating worker 5.
Worker 4 finished running job 4.
Worker 5 finished running job 5.
Worker 1 finished running job 1.
Worker 2 finished running job 2.
Worker 3 finished running job 3.
Worker 1 finished running job 8.
Worker 2 finished running job 9.
Worker 4 finished running job 6.
Worker 5 finished running job 7.
Worker 3 finished running job 10.
Worker 1 finished running job 11.
Worker 1 has no more work. Exiting.
Worker 2 finished running job 12.
Worker 2 has no more work. Exiting.
Worker 4 finished running job 13.
Worker 4 has no more work. Exiting.
Worker 5 finished running job 14.
Worker 5 has no more work. Exiting.
Worker 3 finished running job 15.
Worker 3 has no more work. Exiting. */

Even Better Fix I Can’t Use

Unfortunately, this code is limited to .Net 3.5 right now, but this particular problem looks like a great match for the Task Parallel Library in .Net >= 4.0. It would offload all the thread handling to .Net which is particularly well-suited to this problem: running tasks in parallel.

How To Fix Verizon FiOS Problem Connecting To Websites

February 25, 2014 Chris BenardPersonal
Feb 26, 2014 Update:

Verizon appears to have solved the problem on their end
and an MTU of 1500 will work again. The original problem lasted from at least the beginning
of Feb 24, 2014 until mid-day Feb 26, 2014.

Verizon’s escalation team later called me back and provided the following information:

New uplinks were installed in Lewisville and Plano which were improperly configured and the configuration has been corrected.

The information below remains for historical information on the problem and a workaround.

Verizon Reports the Outage is Resolved

Background on the Problem

I live in the Dallas market for Verizon FiOS which is where it seems the problems are happening.
The issue manifests primarily for me as an inability to play content reliably on YouTube,
but I had many other issues on other CDNs. Trying to view other sites, such as MitchRibar.com, would never succeed for some other FiOS subscribers and me.

The error displayed in Chrome was ERR_CONNECTION_RESET.

List of sites affected

The following sites were reported affected by friends, others on the Internet, and me:
- YouTube
- stackoverflow.com
- icloud.com
- appleid.apple.com
- packagist.org
- 500px.com
- blog.iso50.com
- mitchribar.com
- youtrack.jetbrains.com
- upi.com
- soundcloud.com
- azlyrics.com
- EA Games
- Battlefield 4 (Online Gaming including Xbox Live)
- Seemingly random problems on other sites usually when loading CDN resources
Diagnosis

I called Verizon multiple times and they were no help of course. Since a ping worked and since a traceroute exited their network they said that the problem was either on my side or YouTube’s side and didn’t care that it affected multiple sites. I tried to explain the difference between ICMP and a TCP session but they aren’t that smart, of course. They wouldn’t even talk to me until I plugged in their router which is a terrible piece of equipment I haven’t used for years. I obliged and that’s when I got the above from them. They would not let me talk to a higher tier of support.

However, after finding out some friends had the same issue, I was tipped off to this forum post by started by another Verizon customer. You’ll see most if not all in the thread are from the Dallas market.

I did some ping tests to validate that the problem is the MTU setting. Somewhere in Verizon’s network, close to the DFW side of the route, someone has messed up the MTU and reduced it from the default of 1500 to 1496. Keep in mind that there is a 28 byte header so the successful (non-fragmented) ping size + 28 = MTU.

In the below paste, -f prevents fragmentation of the packet and -l 1472/1468 sets the ping packet length. Keep in mind that the IP header adds 28 bytes and also ping parameters are different on different platforms. This example is from Windows, but check the parameters for your platform to set these options.
```
>ping mitchribar.com -f -l 1472

Pinging mitchribar.com [205.134.224.227] with 1472 bytes of data:
Request timed out.

Ping statistics for 205.134.224.227:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),
Control-C
^C

>ping mitchribar.com -f -l 1468

Pinging mitchribar.com [205.134.224.227] with 1468 bytes of data:
Reply from 205.134.224.227: bytes=1468 time=42ms TTL=55
```
As you can see, 1472 (which equates to a 1500 MTU [1472+28=1500]) did not work. Lowering it until I got to 1468 worked, which equates to an MTU of 1496, so you can see, because of Verizon’s now-broken network, we must lower the MTU from the default of 1500 to 1496 to ensure the packets traverse the network correctly.

I don’t use Verizon’s router (I use DD-WRT and changing the MTU is easy: Setup -> MTU) but Jake Smith provided these screenshots to me that I edited to show the steps. They come from a regular Verizon FiOS router.

Steps to Change the MTU

You should not need to change router settings unless this problem has happened again after Feb 26, 2014. Please see the note at the top of the page.

The information below remains for historical information on the problem and a workaround.

Connect to your router using a web browser

There are many resources available to find out what your router’s LAN IP address is. Connect to this address in your router. Your FiOS router has the default password printed on it if you have not changed it.

My router’s IP is 192.168.0.1, but yours may (and probably will) be different. With my example, I would navigate to http://192.168.0.1 in my web browser.

Click My Network

Click Network Connections

Click Broadband Connection or the Pencil Icon to Edit

Click Configure Connection

Change MTU to Manual and 1496
System.IO.File.OpenWrite Doesn't Overwrite an Existing File

October 23, 2013 Chris BenardPersonal
TLDR: The title is pretty much the entire post.

We receive data from a third-party vendor on a nightly basis and we are migrating to
a new version of the data. Once the data (a zip file) is downloaded, it must be extracted
and we are using #ziplib
to walk the zip file looking for a certain file.

Two versions of our downloading program exist: one for the old version and one for the new version.
Suddenly the vendor started sending the new version of the file to the old downloader, which
broke our import process. The files are fixed width text formats.

I added some defensive checks for the line length on both versions to ensure we don’t try to process
the wrong version with a downloader written for another version, but we continued to have issues with
the older process. I reviewed the file and after about 1.2 million lines, I could see new data right
after the old data. The vendor said they didn’t see it, and I thought they were wrong, but I downloaded
the zip file myself and extracted it and the resulting file looked fine, so I dug deeper.

This may be obvious to anyone who has opened a file for write as a stream, but it was not to me. I used
File.OpenWrite, assuming it would
overwrite the file or create it, but it does not. It is equivalent to using
FileMode.OpenOrCreate.

This is the code as I had it.
```
using (FileStream outStream = File.OpenWrite(extractedFilename))
{
    // use the stream
}
```
I modified it to explicitly specify I want it to use FileMode.Create, which overwrites existing files, along
with more options to be explicit about how I want the file to be opened and shared.
```
using (FileStream outStream = File.Open(extractedFilename, FileMode.Create, FileAccess.Write, FileShare.Read))
{
    // use the stream
}
```
This corrected our issue. Thankfully it wasn’t a bigger problem, because the file we receive is always growing,
but the new dataset is larger than the old, so there was excess data at the end of the file left when we began
writing as position 0.

I thought it was an interesting mistake which led to some strange results that weren’t immediately attributable to
a specific problem.
How to Fix the Dilbert.com RSS Feed

June 28, 2013 Chris BenardPersonal
Dilbert.com Hates Its Users

Update 2: Dilbert support has been removed due to current events relating to its author.

Update 1: Yahoo Pipes is shutting down. As a result, I started a new software project to replace it. The feeds are updated below.

Bad RSS Feed with No Pictures

Dilbert.com recently messed up their web comic RSS feed that I use to read the comic every day. They did this so that you have to click through to the website
to read each and every comic. All you can see is what’s to the right here.

Penny-Arcade did this quite a while back too, and I’ve been using someone’s Yahoo Pipe that ~~fixes this same problem~~ unfortunately this pipe no longer worked anyway and is replaced below. It just pulls the comic strip images from the page and displays it instead. I never dug into how it worked or Yahoo Pipes at all.

~~Yahoo Pipes is pretty awesome. It lets you just drag and drop things and connect them to arrange a workflow. I’m a developer, but I don’t think you have to be to use Pipes.~~

~~Using the Penny-Arcade one as an example, I rigged up two versions of the Dilbert.com feed.~~

Using my Google App Engine hosted new software project, I’ve replaced and added new feeds below. The feeds auto-update every 2 hours and the service serves cached copies, so it should be extremely fast compared to the old pipes feeds.
- Dilbert
  - Modified Atom Feed
  - Transformed (from Atom) Modified RSS Feed. This is a hack. Only use this if you have programmatic requirements requiring RSS over Atom.
- Penny-Arcade
  - Modified RSS Feed
- W.T. Duck
  - Modified RSS Feed. This was requested by a reader previously.
What It Looks Like Now With These Modified Feeds

Much Better RSS Feed with Pictures

I no longer use The Old Reader. I now use NewsBlur and I pay for a premium subscription. I find it to be the best news reader for serious RSS users now. You can follow me there.

Now you get comics back the way they used to be (including images) and don’t have to link out to the site to read it. You can click the title in your feed reader to open the comic’s page (for discussion or other purposes).
Pebble Watch

June 21, 2013 Chris BenardPersonal

Meet Pebble

I got the Pebble Smart Watch a couple of weeks ago, and I’m really enjoying it. It’s a watch that connects to your smart phone and lets you get your notifications like text, emails, and phone calls on your wrist without digging out your phone.

It lasts a really long time, about 9 days before I had to recharge it, and it charges with a little magnetic connector. It has lots of watch faces and apps already, thanks to its SDK and developer community. I’m really partial to the calendar face pictured below.

I haven’t worn a watch since 2006, but I’m already used to it again, for what it provides me. Check out the additional pictures below. I changed out the default rubber strap for a NATO strap.

More Pictures

Default watch face

Controlling music

Text message

Uptime

NATO strap with calendar watch face

Original strap with calendar watch face
New Website and Host

June 14, 2013 Chris BenardPersonal

Site

My last post was in April 2012, so it’s probably time for an update. I successfully imported my blog into a static HTML generating tool called PieCrust. It creates the files serving this site now.

As part of the move, I totally redid the site’s design and functionality at the same time. I had a lot of help (personal and technological). You can read more about it in the about the site section, linked at the bottom of every page.

Host

I love the new host, which was recommended to me by Juan. I’m hosted at Digital Ocean now. I got a great deal by getting 2 months free by using code SSDTWEET. That code should work through June 2013, and if it’s past then when you read this, just google another coupon code.

What you get with Digital Ocean is pretty amazing. You get to pick your Linux distribution, and they spin up the instance in under a minute. I was able to provide an SSH public key and they spun it up with no root password, with the key, so I was able to login securely. Setting everything else up has been a breeze, since I have full root access, and they even have console access (virtual KVM) in case I screw up the firewall rules.

I locked down the firewall, set up nginx (web server) with Google’s pagespeed module and php5-fpm, which is needed for a few dynamic things like the site search. I’ll be posting more about that later.

I get all that for $5/month, and I emailed their support a question about DNS and got a quick reply back. So far, I’m very happy with their lowest end instance.

Old Site

This is what the old site looked like:

Old Site Screenshot
Successful Unlock of Two AT&T iPhones (iPhone 3G and iPhone 4S)

April 10, 2012 Chris BenardPersonal
iTunes Unlock Confirmation

Summary (TL;DR)

This is a bit long but the short and sweet is that you have to:
1. Get AT&T to go to a separate “iPhone Unlock” page before they send the unlocking email. The email is not connected to your IMEI unlock in any way. It’s just instructions.
2. Put a non-AT&T SIM into the iPhone. T-Mobile prepaid works great for this
3. Wait for iPhone to display “Needs Activation” message after it finds the foreign network
4. Connect to iTunes (twice for me) to get the “Congratulations” message. I got an error the first time on each phone after inserting the foreign SIM.
5. No backup/restore is necessary, contradictory to AT&T instructions.
I was able to get a 3G and 4S unlocked.

The Good News

Last Friday, AT&T announced that they would begin unlocking iPhones that met any of the following conditions if the account was in good standing:
- iPhone is out of contract (24 months since purchase)
- iPhone has been upgraded to a new phone with AT&T upgrade
- iPhone has been purchased with no-commitment pricing
- iPhone contract has been terminated and ETF has been paid
I have two phones that meet those conditions. One is an iPhone 3G I got when I signed up for AT&T. I then upgraded to an iPhone 4 when it came out in June 2010. I sold the iPhone 4 to an individual on Craigslist and then purchased an iPhone 4S from an individual on Craigslist who represented to me that it was not connected to any contract (no commitment pricing). Therefore, my 3G and 4S should be eligible.

First Try

I contacted AT&T Sunday morning to request an unlock on both. As I expected, they thought I was crazy and told me AT&T doesn’t unlock iPhones. I requested that they check with a supervisor or check online documentation. After a brief hold, they confirmed they were unlocking iPhones, but they’d have to connect me to another department. After waiting on hold with me for about an hour across two times they hung up on me, I got the second department.

They took my IMEIs (the number that identifies a device on a GSM network) to verify eligibility. My 3G passed. My 4S, I was told, would not be eligible since I didn’t have a receipt and it wasn’t connected to any contract on my account. I appealed this and they spoke with a supervisor. AT&T was able to verify the 4S was not connected to any contract and therefore was eligible. The representative sent me a link to this PDF to explain the rest of the unlock and told me that both phones were unlocked.

The gist of the PDF is that you must backup and restore the iPhone to get it to unlock in iTunes. I knew this was not the normal procedure for other countries that do iPhone unlocking, but I didn’t have enough information to question it yet.

I tried with both iPhones, backing up and restoring, and got no message in iTunes confirming an unlock, like other people from other countries have been getting for years. I thought maybe it was unlocked and I just didn’t know, so on Monday, I paid $10 for a T-Mobile pre-paid MicroSIM so I could test it. Both phones said “Activation Required” and when I plugged them into iTunes, it reported that they were unsupported SIMs. In other words, the phones were not unlocked.

Second Try

Peeved, I called AT&T around 8:30pm on Monday night. I had to start over, and the tier 2 person at AT&T this time had no idea what unlocking meant and kept going to her supervisor who informed me I needed to jailbreak to use it on T-Mobile. I eventually got to this supervisor around 10:50pm (yes, those times are right; the hold to get to tier 2 and 3 was excruciating; they kept wanting me to hang up and call back but I refused), he reiterated the thing about T-Mobile and jailbreaking. I told him this is not how GSM unlocking works.

What I kept explaining is that the previous representative read to me from her AT&T system that “After the ASR has verified the IMEI eligibility and submitted it for unlock”, to go to this other page to send me the previously linked PDF. The PDF is not in any way linked to my IMEI. So I kept telling everyone I spoke to that they were missing a step: in order to unlock the phone, they had to give Apple permission to unlock it, and that’s the part that they weren’t doing. The email itself did nothing other than tell me what would happen if they did unlock it.

Resolution

After arguing for about 15 minutes, since he didn’t understand unlocking but was genuinely trying to help me, Johnathan finally found a link in his system for a separate page that was titled “iPhone Unlock”, hosted by Apple. He put my 3G IMEI into this page after logging in and confirmed the IMEI with me.

I connected the iPhone to iTunes; nothing. I backed it up and restored (no data on the 3G so this goes relatively fast); nothing. I put the T-Mobile SIM in, instead of an AT&T SIM and reconnected to iTunes. This time I got an error, and I forgot to take a screenshot, that said: “Unable to activate” or something like that.

iPhone 4s on T-Mobile

Success

I disconnected and reconnected, and suddenly I got the image at the top of this post, confirming activation. My iPhone 3G displayed T-Mobile at the top, and I got a text from T-Mobile welcoming me to the network.

Johnathan and I followed the same process with the iPhone 4S, and I got exactly the same results. I didn’t do a backup and restore at all on the 4S. I just put the T-Mobile SIM in after he submitted the unlock to Apple, waited for it to display Activation Required, connected it to iTunes, got that error message, disconnected, and reconnected. I immediately got the “Congratulations” again. A screenshot from the 4S is at the right.

I’m now very happy both phones are unlocked to work on any GSM carrier, even though it took a lot of my time and patience.

Zip File Classes Finally Available in .Net 4.5

April 2, 2012 Chris BenardProgramming, Work

Right now .Net 4.5 is still in beta, but I noticed something that will make developers who must interact with zip files happy: .Net 4.5 will have native support for dealing with zip files. Up until now, the System.IO.Compression namespace only had support for GZipStream and DeflateStream.

I, like many other developers, have been using the fantastic SharpZipLib library, but I don’t like to have dependencies in my projects if I don’t have to. In order to iterate through a zip file and list its contents while extracting the code looks something like this (SharpZipLib has a lot of one-liners to allow for extracting with events as well, but bear with me):

private static void ExtractSharp(string zipFile, string extractionLocationSharp)
{
    Console.WriteLine("Extracting with SharpZipLib");
    Console.WriteLine();

    using (var archive = new ZipFile(zipFile))
    {
        int readCount;
        byte[] buffer = new byte[4096];

        foreach (ZipEntry entry in archive)
        {
            Console.WriteLine("Name: {0}, Size: {1}", entry.Name, entry.Size);

            var extractedPath = Path.Combine(extractionLocationSharp, entry.Name);
            if (entry.IsDirectory)
            {
                Directory.CreateDirectory(extractedPath);
            }
            else if (entry.IsFile)
            {
                using (var zipStream = archive.GetInputStream(entry))
                {
                    using (var outputStream = new FileStream(extractedPath, FileMode.CreateNew))
                    {
                        while ((readCount = zipStream.Read(buffer, 0, buffer.Length)) > 0)
                        {
                            outputStream.Write(buffer, 0, readCount);
                        }
                    }
                }
            }
        }
    }

    Console.WriteLine();
}

I haven’t installed the .Net 4.5 beta on my work machine yet, but according to the MSDN documentation, it should look like this:

I don’t know if this compiles in .Net 4.5. I don’t have it installed yet.

private static void ExtractDotNet(string zipFile, string extractionLocationDotNet)
{
    Console.WriteLine("Extracting with .Net 4.5");
    Console.WriteLine();

    using (var archive = ZipFile.OpenRead(zipFile))
    {
        foreach (ZipArchiveEntry entry in archive.Entries)
        {
            Console.WriteLine("Name: {0}, Size: {1}", entry.FullName, entry.Length);

            var extractedPath = Path.Combine(extractionLocationDotNet, entry.FullName);

            // I'm not sure if it will create the directories or not.
            // There does not appear to be an IsDirectory or IsFile like in SharpZipLib
            entry.ExtractToFile(extractedPath);
        }
    }

    Console.WriteLine();
}

As you can see, it looks a bit cleaner, but the nice part is having it built into the framework instead of relying on yet another assembly.

As noted in the comments, I’m not sure how .Net 4.5 will handle the directory entries or if it ignores them as separate entries. I may be able to test the beta later, but feel free to comment if you know how this works.

Example:

Create the PAC

Host Resolver Rules

Run Chrome With New Options

Prove You Are Completely Proxied

Simple Example of the problem

Simple Fix

Better Fix

Queue With Workers Serving It

Even Better Fix I Can’t Use

Feb 26, 2014 Update:

Background on the Problem

List of sites affected

Diagnosis

Steps to Change the MTU

Connect to your router using a web browser

Click My Network

Click Network Connections

Click Broadband Connection or the Pencil Icon to Edit

Click Configure Connection

Change MTU to Manual and 1496

Dilbert.com Hates Its Users

Dilbert

Penny-Arcade

W.T. Duck

What It Looks Like Now With These Modified Feeds

Meet Pebble

More Pictures

Site

Host

Old Site

Summary (TL;DR)

The Good News

First Try

Second Try

Resolution

Success

Posts navigation