Extract Text from RTF in C#/.Net

Extract Text from RTF in C#/.Net

At work, I was tasked with creating a class to strip RTF tags from RTF formatted text, leaving only the plain text. Microsoft’s RichTextBox can do this with its Text property, but it was unavailable in the context in which I’m working.

RTF formatting uses control characters escaped with backslashes along with nested curly braces. Unfortunately, the nesting means I can’t kill the control characters using a single regex, since I’d have to process the stack, and in addition, some control characters should be translated, such as newline and tab characters.

Example:

{\rtf1\ansi\deff0
{\colortbl;\red0\green0\blue0;\red255\green0\blue0;}
This line is the default color\line
\cf2
This line is red\line
\cf1
This line is the default color
}

Thankfully, Markus Jarderot provided a great answer over at StackOverflow, but unfortunately for me, it’s written in Python. I don’t know Python, but I translated it to the best of my abilities to C# since it was very readable.

If this is useful to you, you can download the C# version, or view the original/new code below.

View Original Python Code

View Translated C# Code

Author: Chris Benard

Chris Benard is a software developer in the Dallas area specializing in payments processing, medical claims processing, and Windows/Web services.

Comments