Daily Archives: Wednesday, October 23, 2013

  • System.IO.File.OpenWrite Doesn't Overwrite an Existing File

    TLDR: The title is pretty much the entire post.

    We receive data from a third-party vendor on a nightly basis and we are migrating to
    a new version of the data. Once the data (a zip file) is downloaded, it must be extracted
    and we are using #ziplib
    to walk the zip file looking for a certain file.

    Two versions of our downloading program exist: one for the old version and one for the new version.
    Suddenly the vendor started sending the new version of the file to the old downloader, which
    broke our import process. The files are fixed width text formats.

    I added some defensive checks for the line length on both versions to ensure we don’t try to process
    the wrong version with a downloader written for another version, but we continued to have issues with
    the older process. I reviewed the file and after about 1.2 million lines, I could see new data right
    after the old data. The vendor said they didn’t see it, and I thought they were wrong, but I downloaded
    the zip file myself and extracted it and the resulting file looked fine, so I dug deeper.

    This may be obvious to anyone who has opened a file for write as a stream, but it was not to me. I used
    File.OpenWrite, assuming it would
    overwrite the file or create it, but it does not. It is equivalent to using
    FileMode.OpenOrCreate.

    This is the code as I had it.

    using (FileStream outStream = File.OpenWrite(extractedFilename))
    {
        // use the stream
    }

    I modified it to explicitly specify I want it to use FileMode.Create, which overwrites existing files, along
    with more options to be explicit about how I want the file to be opened and shared.

    using (FileStream outStream = File.Open(extractedFilename, FileMode.Create, FileAccess.Write, FileShare.Read))
    {
        // use the stream
    }
    

    This corrected our issue. Thankfully it wasn’t a bigger problem, because the file we receive is always growing,
    but the new dataset is larger than the old, so there was excess data at the end of the file left when we began
    writing as position 0.

    I thought it was an interesting mistake which led to some strange results that weren’t immediately attributable to
    a specific problem.