最近寫程式的時候,發現不同的 compiler,其 snprintf() 的行為不太一樣。

snprintf() 的功能是,就好像 printf()/fprintf()/sprintf() 一樣,給定一個 format specifier,以及額外的一些不定個數的參數,將會依序依指定的格式,填入 format specifier 裡面,以 % 開頭的欄位。printf()fprintf() 會把結果,輸出到 STDOUT 或指定的 FILE stream,而 sprintf() 則是會把結果,填入第一個參數:一個 C-style 字串 buffer。然而,由於 sprintf() 在 protocol 設計上,沒有辦法讓實作 sprintf() 的程式庫,在被呼叫後,得知這個 buffer 的大小,因此,若結果比實際上的 buffer 還要長,就會造成 buffer overflow 的問題。因此,snprintf() 多出了第二個參數:buffer 的大小,以避免這個問題。

Buffer 不夠大時,snprintf() 會印出什麼?

在正常使用下,snprintf() 很好用[1],但若是 buffer 大小不夠的時候呢?例如,以下的程式先將一塊 buffer buf 全部填成 'x',然後最後一個字元設成 NULL,呼叫 snprintf() 之後,將其回傳值與 buffer 的內容印出:

#include <stdio.h>
#include <string.h>

#ifdef _MSC_VER
#   define snprintf _snprintf
#endif

int main()
{
    char buf[16];
    int  ret;
    memset(buf, 'x', sizeof(buf));               // fill with 'x'
    buf[(sizeof(buf) / sizeof(buf[0])) - 1] = 0; // make last char null
    ret = snprintf(buf, 4, "%s", "0123456789");
    printf("ret: %d\n", ret);
    printf("buf: %s\n", buf);
    return 0;
}

如果是 GCC,執行結果如下:

ret: 10
buf: 012

但如果是 VC6,執行結果竟然如下:

ret: -1
buf: 0123xxxxxxxxxxx

兩者的行為,完全不一樣,不僅回傳值不同,連實際印出去的內容也不同。雖然說 snprintf() 不是 C89 有規定的標準函式,但好歹在 C99 時,已經被標準列入。

各家說法

且讓我們先看看各家的說法如何:

FreeBSD 的男人是這麼形容 snprintf() 的:

PRINTF(3)              FreeBSD Library Functions Manual              PRINTF(3)

NAME
     printf, fprintf, sprintf, snprintf, asprintf, vprintf, vfprintf,
     vsprintf, vsnprintf, vasprintf -- formatted output conversion

LIBRARY
     Standard C Library (libc, -lc)

SYNOPSIS
     #include <stdio.h>

     ...

     int
     snprintf(char * restrict str, size_t size, const char * restrict format,
         ...);

     ...

     int
     vsnprintf(char * restrict str, size_t size, const char * restrict format,
         va_list ap);

DESCRIPTION

     ...

     These functions return the number of characters printed (not including
     the trailing `\0' used to end output to strings) or a negative value if
     an output error occurs, except for snprintf() and vsnprintf(), which
     return the number of characters that would have been printed if the size
     were unlimited (again, not including the final `\0').

     ...

     The snprintf() and vsnprintf() functions will write at most size-1 of the
     characters printed into the output string (the size'th character then
     gets the terminating `\0'); if the return value is greater than or equal
     to the size argument, the string was too short and some of the printed
     characters were discarded.  The output is always null-terminated.

     ...

Microsoft Visual C++ 的 MSDN 是這麼說的(VC6)

Return Value

_snprintf returns the number of bytes stored in buffer, not counting the
terminating null character.  If the number of bytes required to store the data
exceeds count, then count bytes of data are stored in buffer and a negative
value is returned.  _snwprintf returns the number of wide characters stored in
buffer, not counting the terminating null wide character.  If the storage
required to store the data exceeds count wide characters, then count wide
characters are stored in buffer and a negative value is returned.

而 C99 則是這麼說:

    7.19.6.5 The snprintf function

1   Synopsis

          #include <stdio.h>
          int snprintf(char * restrict s, size_t n,
               const char * restrict format, ...);

    Description

2   The snprintf function is equivalent to fprintf, except that the output is
    written into an array (specified by argument s) rather than to a stream.
    If n is zero, nothing is written, and s may be a null pointer.  Otherwise,
    output characters beyond the n-1st are discarded rather than being written
    to the array, and a null character is written at the end of the characters
    actually written into the array.  If copying takes place between objects
    that overlap, the behavior is undefined.

    Returns

3   The snprintf function returns the number of characters that would have
    been written had n been sufficiently large, not counting the terminating
    null character, or a neg ative value if an encoding error occurred.  Thus,
    the null-terminated output has been completely written if and only if the
    returned value is nonnegative and less than n.

行為差異一:snprintf() 的回傳值

首先,就回傳值的部份,C99 的說法有些繞口:

The snprintf function returns the number of characters that would have been written had n been sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred.

為避免因筆者的英文程度不夠好而有所誤解,特地請教了 lukhnos,確認了 C99 的意思是:不管給的 n 有多大,snprintf() 會回傳,假設 n 一定夠大時,會輸出的長度。

FreeBSD 的行為,與 C99 是一致的:「... except for snprintf() and vsnprintf(), which return the number of characters that would have been printed if the size were unlimited ...」。

但 Microsoft Visual C++ 的行為,則與 C99 不同:「If the number of bytes required to store the data exceeds count, then count bytes of data are stored in buffer and a negative value is returned.」亦即,只要 buffer 不夠大,就一律回傳負值。

行為差異二:snprintf() 實際印出的資料

除了回傳值,Microsoft Visual C++ 的行為與別人不同以外,連印出來的部份,也不相同。在最一開始的範例裡,GCC 的 snprintf(),將 "012\0" 存到了 buf 裡,連同 null character,一共是 4 個 characters;而 Microsoft Visual C++ 則是將 "0123" 存到了 buf 裡,沒有附上 null character,故若範例裡沒有做特別處理的話,將 buf 印出時,會出問題。

C99 說:「output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array.」翻成中文就是,將會輸出 n 個資源,包含額外附加的 null character。

FreeBSD 的 manpage 也這麼說:「The snprintf() and vsnprintf() functions will write at most size-1 of the characters printed into the output string (the size'th character then gets the terminating `\0');...」故範例程式裡,GCC 印出 4 - 1 個字元,也就是 "012" 然後附上一個 null character,也就是 "012\0",這樣的行為,是符合 C99 標準的。

而 MSDN 的說法則與 C99 標準不符:「If the number of bytes required to store the data exceeds count, then count bytes of data are stored in buffer...」從範例程式的執行結果來看,4 個字元 "0123" 被存到了 buf 裡,但因為沒有附上 null character,故 buf 的其它部份,都還是 x,以及一個防止印出 buf 時出錯的最後一個 null character。

回傳印出長度的意義

事實上,C99 規定 snprintf() 不管 buffer 夠不夠大,一定回傳印出長度,這樣的設計是很好用的,因為,大部分的時候,我們其實並不能知道,buffer 夠不夠大。是故,程式通常必須得這麼寫:

#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
#   include <io.h>
#   define STDOUT_FILENO 1
#   define write         _write
#   define snprintf      _snprintf
#else
#   include <sys/types.h>
#   include <sys/uio.h>
#   include <unistd.h>
#endif

/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
    char* pbuf = 0;
    int   size;

    // Get required size, and allocate enough memory
    size = snprintf(0, 0, "Hello, %s!\n", name);
    pbuf = (char*)malloc(size + 1);

    // Do the formatting
    snprintf(pbuf, size + 1, "Hello, %s!\n", name);

    // Write formatted string
    write(STDOUT_FILENO, pbuf, size);

    // Free allocated memory
    free(pbuf);
}

int main()
{
    write_hello("sign");                        //  4 chars: total 13 chars when write
    write_hello("jeffhung");                    //  8 chars: total 17 chars when write
    // Longest word in Shakespeare's works
    // @see http://en.wikipedia.org/wiki/Longest_word_in_English
    write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
    return 0;
}

// --[OUTPUT(GCC)]------------------------------------------------------------
// Hello, sign!
// Hello, jeffhung!
// Hello, Honorificabilitudinitatibus!
// --[OUTPUT(VC6)]------------------------------------------------------------
// (crashed)

但如果當 buffer 不夠大時,不能回傳實際上需要的 buffer 大小時,我們就只能夠用試誤法,去逼出真正需要的大小:

#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
#   include <io.h>
#   define STDOUT_FILENO 1
#   define write         _write
#   define snprintf      _snprintf
#else
#   include <sys/types.h>
#   include <sys/uio.h>
#   include <unistd.h>
#endif

/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
    char* pbuf = 0;
    int size = 0;
    int len;

    do {
        size += 16;
        printf("[DEBUG] size == %d\n", size);
        // Allocate a buffer, don't know whether it is big enough or not
        pbuf = (char*)realloc(pbuf, size); // will do malloc if pbuf is NULL
        // ------------------------------------------------------------------
        // Do the formatting
        // ------------------------------------------------------------------
        // MSDN:
        // Let len be the length of the formatted data string (not including
        // the terminating null).
        // - If len < count, then len characters are stored in buffer,
        //                   a null-terminator is appended,
        //                   and len is returned.
        // - If len = count, then len characters are stored in buffer,
        //                   no null-terminator is appended,
        //                   and len is returned.
        // - If len > count, then count characters are stored in buffer,
        //                   no null-terminator is appended,
        //                   and a negative value is returned.
        // ------------------------------------------------------------------
        // Since snprintf in VC may not append a null-terminator, we pass
        // (size - 1) as the 2nd parameter and reserve the last buffer
        // element for appending the null-terminator by our self.
        // ------------------------------------------------------------------
        len = snprintf(pbuf, (size - 1), "Hello, %s!\n", name);
        printf("[DEBUG] len == %d\n", len);
    } while (len < 0);
    pbuf[len] = '\0';

    // Write formatted string
    write(STDOUT_FILENO, pbuf, len);

    // Free allocated memory
    free(pbuf);
}

int main()
{
    write_hello("sign");                        //  4 chars: total 13 chars when write
    write_hello("jeffhung");                    //  8 chars: total 17 chars when write
    // Longest word in Shakespeare's works
    // @see http://en.wikipedia.org/wiki/Longest_word_in_English
    write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
    return 0;
}

// --[OUTPUT(GCC)]------------------------------------------------------------
// [DEBUG] size == 16
// [DEBUG] len == 13
// Hello, sign!
// [DEBUG] size == 16
// [DEBUG] len == 17
// Hello, jeffhun[DEBUG] size == 16
// [DEBUG] len == 36
// Hello, Honorif
// --[OUTPUT(VC6)]------------------------------------------------------------
// [DEBUG] size == 16
// [DEBUG] len == 13
// Hello, sign!
// [DEBUG] size == 16
// [DEBUG] len == -1
// [DEBUG] size == 32
// [DEBUG] len == 17
// Hello, jeffhung!
// [DEBUG] size == 16
// [DEBUG] len == -1
// [DEBUG] size == 32
// [DEBUG] len == -1
// [DEBUG] size == 48
// [DEBUG] len == 36
// Hello, Honorificabilitudinitatibus!

當我們要對 jeffhung 說哈囉時,因為長度為 16 的 buf 不夠大,因此多做了一圈迴圈。如果 buf 增長大小的速度,與給定的 name 字串長度差很遠的話,迴圈就要跑很多遍,不斷地在 realloc(),不斷地重新呼叫 snprintf(),不斷地浪費時間。

因此,C99 規定不管 buffer 夠不夠大,snprintf() 都要傳回,預計將要印出的長度,這樣的設計,非常的有效率。

解決的辦法

考量到 local buffer 的效率比 dynamic buffer 要來的好,並整合標準與不標準的 snprintf() 用法,最後,我們更希望能夠盡可能地減少呼叫 snprintf() 的次數。所以,上面的程式,可以改成這樣[2]

#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
#   include <io.h>
#   define STDOUT_FILENO 1
#   define write         _write
#   define snprintf      _snprintf
#else
#   include <sys/types.h>
#   include <sys/uio.h>
#   include <unistd.h>
#endif

/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
    char  buf[16];
    char* pbuf      = buf;
    int   pbuf_size = sizeof(buf);
    int   len       = 0;
    int   again     = 0;

    printf("[DEBUG] name == \"%s\"\n", name);

    do {
        if (again) {
#ifdef _MSC_VER
            pbuf_size += sizeof(buf);
#else
            pbuf_size = len + 1;
#endif
            pbuf = (pbuf == buf) ? malloc(pbuf_size)
                                 : realloc(pbuf, pbuf_size);
        }
        printf("[DEBUG] pbuf_size == %d\n", pbuf_size);
        len = snprintf(pbuf, pbuf_size, "Hello, %s!\n", name);
        printf("[DEBUG] len == %d\n", len);
    } while (again = ((len < 0) || (pbuf_size <= len)));
#ifdef _MSC_VER
    pbuf[len] = '\0';
#endif

    printf("[DEBUG] {%d} %s", len, pbuf); // to verify the null-terminator
    write(STDOUT_FILENO, pbuf, len);

    if (pbuf != buf) {
        printf("[DEBUG] free pbuf\n");
        free(pbuf);
    }
}

int main()
{
    write_hello("sign");                        //  4 chars: total 13 chars when write
    write_hello("jeffhung");                    //  8 chars: total 17 chars when write
    // Longest word in Shakespeare's works
    // @see http://en.wikipedia.org/wiki/Longest_word_in_English
    write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
    return 0;
}

// --[OUTPUT(GCC)]------------------------------------------------------------
// [DEBUG] name == "sign"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] name == "jeffhung"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 17
// [DEBUG] pbuf_size == 18
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// Hello, jeffhung!
// [DEBUG] free pbuf
// [DEBUG] name == "Honorificabilitudinitatibus"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 36
// [DEBUG] pbuf_size == 37
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// --[OUTPUT(VC6)]------------------------------------------------------------
// [DEBUG] name == "sign"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] name == "jeffhung"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// Hello, jeffhung!
// [DEBUG] free pbuf
// [DEBUG] name == "Honorificabilitudinitatibus"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 48
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf

如此的寫法,僅需在兩處地方動用 preprocessor,解決 snprintf() 行為的歧異。

然而,如果每次用到 snprintf() 的時候,都要回憶一下 write_hello() 是怎麼寫的,然後依樣畫葫蘆,這樣也太蠢了。不過,其實我們可以把 write_hello() 改造一下,並援引 vsnprintf(),就可以寫出 strprintf(),像 snprintf() 一般,但不是印到一個 buffer,而是印到一個 C++ std::string 裡:

#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string>
#ifdef _MSC_VER
#   include <io.h>
#   define STDOUT_FILENO 1
#   define write         _write
#   define snprintf      _snprintf
#   define vsnprintf     _vsnprintf
#else
#   include <sys/types.h>
#   include <sys/uio.h>
#   include <unistd.h>
#endif

std::string strprintf(const char* fmt, ...)
{
    char    buf[16];
    char*   pbuf      = buf;
    int     pbuf_size = sizeof(buf);
    int     len       = 0;
    int     again     = 0;
    va_list ap;

    va_start(ap, fmt);

    do {
        if (again) {
#ifdef _MSC_VER
            pbuf_size += sizeof(buf);
#else
            pbuf_size = len + 1;
#endif
            pbuf = (char*)((pbuf == buf) ? malloc(pbuf_size)
                                         : realloc(pbuf, pbuf_size));
        }
        printf("[DEBUG] pbuf_size == %d\n", pbuf_size);
        len = vsnprintf(pbuf, pbuf_size, fmt, ap);
        printf("[DEBUG] len == %d\n", len);
    } while (again = ((len < 0) || (pbuf_size <= len)));
#ifdef _MSC_VER
    pbuf[len] = '\0';
#endif

    printf("[DEBUG] {%d} %s", len, pbuf); // to verify the null-terminator

    std::string str(pbuf);

    if (pbuf != buf) {
        printf("[DEBUG] free pbuf\n");
        free(pbuf);
    }

    return str;
}

void write_hello(const char* name)
{
    // 9 chars: counting ending \n,
    //          but not counting %s replacement and null-terminator
    std::string hello = strprintf("Hello, %s!\n", name);
    write(STDOUT_FILENO, hello.c_str(), hello.length());
}

int main()
{
    write_hello("sign");                        //  4 chars: total 13 chars when write
    write_hello("jeffhung");                    //  8 chars: total 17 chars when write
    // Longest word in Shakespeare's works
    // @see http://en.wikipedia.org/wiki/Longest_word_in_English
    write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
    return 0;
}

// --[OUTPUT(C99)]------------------------------------------------------------
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 17
// [DEBUG] pbuf_size == 18
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// [DEBUG] free pbuf
// Hello, jeffhung!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 36
// [DEBUG] pbuf_size == 37
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// Hello, Honorificabilitudinitatibus!
// --[OUTPUT(VC)]-------------------------------------------------------------
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// [DEBUG] free pbuf
// Hello, jeffhung!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 48
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// Hello, Honorificabilitudinitatibus!

如此一來,只要我們是用 C++,就可以很方便地,利用 strprintf(),援引 printf 系列的強大功能,產生格式化字串。

參考資料


  1. 唯一的缺憾是,在 Visual C++ 系列編譯器裡,叫做 _snprintf(),這當然是有著奇怪的理由
  2. 逞技一下。:-p