Non-standard behavior of (v)snprintf, on Visual C++
最近寫程式的時候,發現不同的 compiler,其 snprintf() 的行為不太一樣。
snprintf() 的功能是,就好像 printf()/fprintf()/sprintf() 一樣,給定一個 format specifier,以及額外的一些不定個數的參數,將會依序依指定的格式,填入 format specifier 裡面,以 % 開頭的欄位。printf() 和 fprintf() 會把結果,輸出到 STDOUT 或指定的 FILE stream,而 sprintf() 則是會把結果,填入第一個參數:一個 C-style 字串 buffer。然而,由於 sprintf() 在 protocol 設計上,沒有辦法讓實作 sprintf() 的程式庫,在被呼叫後,得知這個 buffer 的大小,因此,若結果比實際上的 buffer 還要長,就會造成 buffer overflow 的問題。因此,snprintf() 多出了第二個參數:buffer 的大小,以避免這個問題。
Buffer 不夠大時,snprintf() 會印出什麼?
在正常使用下,snprintf() 很好用[1],但若是 buffer 大小不夠的時候呢?例如,以下的程式先將一塊 buffer buf 全部填成 'x',然後最後一個字元設成 NULL,呼叫 snprintf() 之後,將其回傳值與 buffer 的內容印出:
#include <stdio.h>
#include <string.h>
#ifdef _MSC_VER
# define snprintf _snprintf
#endif
int main()
{
char buf[16];
int ret;
memset(buf, 'x', sizeof(buf)); // fill with 'x'
buf[(sizeof(buf) / sizeof(buf[0])) - 1] = 0; // make last char null
ret = snprintf(buf, 4, "%s", "0123456789");
printf("ret: %d\n", ret);
printf("buf: %s\n", buf);
return 0;
}
如果是 GCC,執行結果如下:
ret: 10 buf: 012
但如果是 VC6,執行結果竟然如下:
ret: -1 buf: 0123xxxxxxxxxxx
兩者的行為,完全不一樣,不僅回傳值不同,連實際印出去的內容也不同。雖然說 snprintf() 不是 C89 有規定的標準函式,但好歹在 C99 時,已經被標準列入。
各家說法
且讓我們先看看各家的說法如何:
FreeBSD 的男人是這麼形容 snprintf() 的:
PRINTF(3) FreeBSD Library Functions Manual PRINTF(3)
NAME
printf, fprintf, sprintf, snprintf, asprintf, vprintf, vfprintf,
vsprintf, vsnprintf, vasprintf -- formatted output conversion
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <stdio.h>
...
int
snprintf(char * restrict str, size_t size, const char * restrict format,
...);
...
int
vsnprintf(char * restrict str, size_t size, const char * restrict format,
va_list ap);
DESCRIPTION
...
These functions return the number of characters printed (not including
the trailing `\0' used to end output to strings) or a negative value if
an output error occurs, except for snprintf() and vsnprintf(), which
return the number of characters that would have been printed if the size
were unlimited (again, not including the final `\0').
...
The snprintf() and vsnprintf() functions will write at most size-1 of the
characters printed into the output string (the size'th character then
gets the terminating `\0'); if the return value is greater than or equal
to the size argument, the string was too short and some of the printed
characters were discarded. The output is always null-terminated.
...
Microsoft Visual C++ 的 MSDN 是這麼說的(VC6):
Return Value _snprintf returns the number of bytes stored in buffer, not counting the terminating null character. If the number of bytes required to store the data exceeds count, then count bytes of data are stored in buffer and a negative value is returned. _snwprintf returns the number of wide characters stored in buffer, not counting the terminating null wide character. If the storage required to store the data exceeds count wide characters, then count wide characters are stored in buffer and a negative value is returned.
而 C99 則是這麼說:
7.19.6.5 The snprintf function
1 Synopsis
#include <stdio.h>
int snprintf(char * restrict s, size_t n,
const char * restrict format, ...);
Description
2 The snprintf function is equivalent to fprintf, except that the output is
written into an array (specified by argument s) rather than to a stream.
If n is zero, nothing is written, and s may be a null pointer. Otherwise,
output characters beyond the n-1st are discarded rather than being written
to the array, and a null character is written at the end of the characters
actually written into the array. If copying takes place between objects
that overlap, the behavior is undefined.
Returns
3 The snprintf function returns the number of characters that would have
been written had n been sufficiently large, not counting the terminating
null character, or a neg ative value if an encoding error occurred. Thus,
the null-terminated output has been completely written if and only if the
returned value is nonnegative and less than n.
行為差異一:snprintf() 的回傳值
首先,就回傳值的部份,C99 的說法有些繞口:
The
snprintffunction returns the number of characters that would have been written hadnbeen sufficiently large, not counting the terminating null character, or a negative value if an encoding error occurred.
為避免因筆者的英文程度不夠好而有所誤解,特地請教了 lukhnos,確認了 C99 的意思是:不管給的 n 有多大,snprintf() 會回傳,假設 n 一定夠大時,會輸出的長度。
FreeBSD 的行為,與 C99 是一致的:「... except for snprintf() and vsnprintf(), which return the number of characters that would have been printed if the size were unlimited ...」。
但 Microsoft Visual C++ 的行為,則與 C99 不同:「If the number of bytes required to store the data exceeds count, then count bytes of data are stored in buffer and a negative value is returned.」亦即,只要 buffer 不夠大,就一律回傳負值。
行為差異二:snprintf() 實際印出的資料
除了回傳值,Microsoft Visual C++ 的行為與別人不同以外,連印出來的部份,也不相同。在最一開始的範例裡,GCC 的 snprintf(),將 "012\0" 存到了 buf 裡,連同 null character,一共是 4 個 characters;而 Microsoft Visual C++ 則是將 "0123" 存到了 buf 裡,沒有附上 null character,故若範例裡沒有做特別處理的話,將 buf 印出時,會出問題。
C99 說:「output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array.」翻成中文就是,將會輸出 n 個資源,包含額外附加的 null character。
FreeBSD 的 manpage 也這麼說:「The snprintf() and vsnprintf() functions will write at most size-1 of the characters printed into the output string (the size'th character then gets the terminating `\0');...」故範例程式裡,GCC 印出 4 - 1 個字元,也就是 "012" 然後附上一個 null character,也就是 "012\0",這樣的行為,是符合 C99 標準的。
而 MSDN 的說法則與 C99 標準不符:「If the number of bytes required to store the data exceeds count, then count bytes of data are stored in buffer...」從範例程式的執行結果來看,4 個字元 "0123" 被存到了 buf 裡,但因為沒有附上 null character,故 buf 的其它部份,都還是 x,以及一個防止印出 buf 時出錯的最後一個 null character。
回傳印出長度的意義
事實上,C99 規定 snprintf() 不管 buffer 夠不夠大,一定回傳印出長度,這樣的設計是很好用的,因為,大部分的時候,我們其實並不能知道,buffer 夠不夠大。是故,程式通常必須得這麼寫:
#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
# include <io.h>
# define STDOUT_FILENO 1
# define write _write
# define snprintf _snprintf
#else
# include <sys/types.h>
# include <sys/uio.h>
# include <unistd.h>
#endif
/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
char* pbuf = 0;
int size;
// Get required size, and allocate enough memory
size = snprintf(0, 0, "Hello, %s!\n", name);
pbuf = (char*)malloc(size + 1);
// Do the formatting
snprintf(pbuf, size + 1, "Hello, %s!\n", name);
// Write formatted string
write(STDOUT_FILENO, pbuf, size);
// Free allocated memory
free(pbuf);
}
int main()
{
write_hello("sign"); // 4 chars: total 13 chars when write
write_hello("jeffhung"); // 8 chars: total 17 chars when write
// Longest word in Shakespeare's works
// @see http://en.wikipedia.org/wiki/Longest_word_in_English
write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
return 0;
}
// --[OUTPUT(GCC)]------------------------------------------------------------
// Hello, sign!
// Hello, jeffhung!
// Hello, Honorificabilitudinitatibus!
// --[OUTPUT(VC6)]------------------------------------------------------------
// (crashed)
但如果當 buffer 不夠大時,不能回傳實際上需要的 buffer 大小時,我們就只能夠用試誤法,去逼出真正需要的大小:
#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
# include <io.h>
# define STDOUT_FILENO 1
# define write _write
# define snprintf _snprintf
#else
# include <sys/types.h>
# include <sys/uio.h>
# include <unistd.h>
#endif
/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
char* pbuf = 0;
int size = 0;
int len;
do {
size += 16;
printf("[DEBUG] size == %d\n", size);
// Allocate a buffer, don't know whether it is big enough or not
pbuf = (char*)realloc(pbuf, size); // will do malloc if pbuf is NULL
// ------------------------------------------------------------------
// Do the formatting
// ------------------------------------------------------------------
// MSDN:
// Let len be the length of the formatted data string (not including
// the terminating null).
// - If len < count, then len characters are stored in buffer,
// a null-terminator is appended,
// and len is returned.
// - If len = count, then len characters are stored in buffer,
// no null-terminator is appended,
// and len is returned.
// - If len > count, then count characters are stored in buffer,
// no null-terminator is appended,
// and a negative value is returned.
// ------------------------------------------------------------------
// Since snprintf in VC may not append a null-terminator, we pass
// (size - 1) as the 2nd parameter and reserve the last buffer
// element for appending the null-terminator by our self.
// ------------------------------------------------------------------
len = snprintf(pbuf, (size - 1), "Hello, %s!\n", name);
printf("[DEBUG] len == %d\n", len);
} while (len < 0);
pbuf[len] = '\0';
// Write formatted string
write(STDOUT_FILENO, pbuf, len);
// Free allocated memory
free(pbuf);
}
int main()
{
write_hello("sign"); // 4 chars: total 13 chars when write
write_hello("jeffhung"); // 8 chars: total 17 chars when write
// Longest word in Shakespeare's works
// @see http://en.wikipedia.org/wiki/Longest_word_in_English
write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
return 0;
}
// --[OUTPUT(GCC)]------------------------------------------------------------
// [DEBUG] size == 16
// [DEBUG] len == 13
// Hello, sign!
// [DEBUG] size == 16
// [DEBUG] len == 17
// Hello, jeffhun[DEBUG] size == 16
// [DEBUG] len == 36
// Hello, Honorif
// --[OUTPUT(VC6)]------------------------------------------------------------
// [DEBUG] size == 16
// [DEBUG] len == 13
// Hello, sign!
// [DEBUG] size == 16
// [DEBUG] len == -1
// [DEBUG] size == 32
// [DEBUG] len == 17
// Hello, jeffhung!
// [DEBUG] size == 16
// [DEBUG] len == -1
// [DEBUG] size == 32
// [DEBUG] len == -1
// [DEBUG] size == 48
// [DEBUG] len == 36
// Hello, Honorificabilitudinitatibus!
當我們要對 jeffhung 說哈囉時,因為長度為 16 的 buf 不夠大,因此多做了一圈迴圈。如果 buf 增長大小的速度,與給定的 name 字串長度差很遠的話,迴圈就要跑很多遍,不斷地在 realloc(),不斷地重新呼叫 snprintf(),不斷地浪費時間。
因此,C99 規定不管 buffer 夠不夠大,snprintf() 都要傳回,預計將要印出的長度,這樣的設計,非常的有效率。
解決的辦法
考量到 local buffer 的效率比 dynamic buffer 要來的好,並整合標準與不標準的 snprintf() 用法,最後,我們更希望能夠盡可能地減少呼叫 snprintf() 的次數。所以,上面的程式,可以改成這樣[2]:
#include <stdio.h>
#include <stdlib.h>
#ifdef _MSC_VER
# include <io.h>
# define STDOUT_FILENO 1
# define write _write
# define snprintf _snprintf
#else
# include <sys/types.h>
# include <sys/uio.h>
# include <unistd.h>
#endif
/** Write a "Hello, <name>!\n" message to file descriptor STDOUT_FILENO. */
void write_hello(const char* name)
{
char buf[16];
char* pbuf = buf;
int pbuf_size = sizeof(buf);
int len = 0;
int again = 0;
printf("[DEBUG] name == \"%s\"\n", name);
do {
if (again) {
#ifdef _MSC_VER
pbuf_size += sizeof(buf);
#else
pbuf_size = len + 1;
#endif
pbuf = (pbuf == buf) ? malloc(pbuf_size)
: realloc(pbuf, pbuf_size);
}
printf("[DEBUG] pbuf_size == %d\n", pbuf_size);
len = snprintf(pbuf, pbuf_size, "Hello, %s!\n", name);
printf("[DEBUG] len == %d\n", len);
} while (again = ((len < 0) || (pbuf_size <= len)));
#ifdef _MSC_VER
pbuf[len] = '\0';
#endif
printf("[DEBUG] {%d} %s", len, pbuf); // to verify the null-terminator
write(STDOUT_FILENO, pbuf, len);
if (pbuf != buf) {
printf("[DEBUG] free pbuf\n");
free(pbuf);
}
}
int main()
{
write_hello("sign"); // 4 chars: total 13 chars when write
write_hello("jeffhung"); // 8 chars: total 17 chars when write
// Longest word in Shakespeare's works
// @see http://en.wikipedia.org/wiki/Longest_word_in_English
write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
return 0;
}
// --[OUTPUT(GCC)]------------------------------------------------------------
// [DEBUG] name == "sign"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] name == "jeffhung"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 17
// [DEBUG] pbuf_size == 18
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// Hello, jeffhung!
// [DEBUG] free pbuf
// [DEBUG] name == "Honorificabilitudinitatibus"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 36
// [DEBUG] pbuf_size == 37
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// --[OUTPUT(VC6)]------------------------------------------------------------
// [DEBUG] name == "sign"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] name == "jeffhung"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// Hello, jeffhung!
// [DEBUG] free pbuf
// [DEBUG] name == "Honorificabilitudinitatibus"
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 48
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
如此的寫法,僅需在兩處地方動用 preprocessor,解決 snprintf() 行為的歧異。
然而,如果每次用到 snprintf() 的時候,都要回憶一下 write_hello() 是怎麼寫的,然後依樣畫葫蘆,這樣也太蠢了。不過,其實我們可以把 write_hello() 改造一下,並援引 vsnprintf(),就可以寫出 strprintf(),像 snprintf() 一般,但不是印到一個 buffer,而是印到一個 C++ std::string 裡:
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string>
#ifdef _MSC_VER
# include <io.h>
# define STDOUT_FILENO 1
# define write _write
# define snprintf _snprintf
# define vsnprintf _vsnprintf
#else
# include <sys/types.h>
# include <sys/uio.h>
# include <unistd.h>
#endif
std::string strprintf(const char* fmt, ...)
{
char buf[16];
char* pbuf = buf;
int pbuf_size = sizeof(buf);
int len = 0;
int again = 0;
va_list ap;
va_start(ap, fmt);
do {
if (again) {
#ifdef _MSC_VER
pbuf_size += sizeof(buf);
#else
pbuf_size = len + 1;
#endif
pbuf = (char*)((pbuf == buf) ? malloc(pbuf_size)
: realloc(pbuf, pbuf_size));
}
printf("[DEBUG] pbuf_size == %d\n", pbuf_size);
len = vsnprintf(pbuf, pbuf_size, fmt, ap);
printf("[DEBUG] len == %d\n", len);
} while (again = ((len < 0) || (pbuf_size <= len)));
#ifdef _MSC_VER
pbuf[len] = '\0';
#endif
printf("[DEBUG] {%d} %s", len, pbuf); // to verify the null-terminator
std::string str(pbuf);
if (pbuf != buf) {
printf("[DEBUG] free pbuf\n");
free(pbuf);
}
return str;
}
void write_hello(const char* name)
{
// 9 chars: counting ending \n,
// but not counting %s replacement and null-terminator
std::string hello = strprintf("Hello, %s!\n", name);
write(STDOUT_FILENO, hello.c_str(), hello.length());
}
int main()
{
write_hello("sign"); // 4 chars: total 13 chars when write
write_hello("jeffhung"); // 8 chars: total 17 chars when write
// Longest word in Shakespeare's works
// @see http://en.wikipedia.org/wiki/Longest_word_in_English
write_hello("Honorificabilitudinitatibus"); // 27 chars: total 36 chars when write
return 0;
}
// --[OUTPUT(C99)]------------------------------------------------------------
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 17
// [DEBUG] pbuf_size == 18
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// [DEBUG] free pbuf
// Hello, jeffhung!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 36
// [DEBUG] pbuf_size == 37
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// Hello, Honorificabilitudinitatibus!
// --[OUTPUT(VC)]-------------------------------------------------------------
// [DEBUG] pbuf_size == 16
// [DEBUG] len == 13
// [DEBUG] {13} Hello, sign!
// Hello, sign!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == 17
// [DEBUG] {17} Hello, jeffhung!
// [DEBUG] free pbuf
// Hello, jeffhung!
// [DEBUG] pbuf_size == 16
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 32
// [DEBUG] len == -1
// [DEBUG] pbuf_size == 48
// [DEBUG] len == 36
// [DEBUG] {36} Hello, Honorificabilitudinitatibus!
// [DEBUG] free pbuf
// Hello, Honorificabilitudinitatibus!
如此一來,只要我們是用 C++,就可以很方便地,利用 strprintf(),援引 printf 系列的強大功能,產生格式化字串。



4 Comments
我都用 ostringstream,簡單好用又安全啊。而且大部分用到這東西的地方是比較沒有效能的嚴格考量的。
是啊,所以 MSVC 有 _TRUNCATE: http://msdn2.microsoft.com/en-us/library/ms175769.aspx
BTW, exceptional C++ style 的 Item 2 和 Item 3 就在比較 sprintf, snprintf, std::stringstream, std::strstream, boost::lexical_cast.
在Microsoft Visual C++ 速成版的帮助文件中,有如下内容:
Visual C++ 符合下列标准:
ISO C 95
ISO C++ 98
Ecma C++/CLI 05
可见,微软对C99望而却步了,估计是为习惯VC++的程序员考虑,无法抛弃VC++原有的与C99冲突的特性。
ha ! 找到這裡來。
感謝你,剛好在linux遇到類似問題,
man vsnpritf 時,看到程式碼,原以為直接拿來用準沒錯,沒想到卻出了問題,哀。
謝啦 !!
Post a Comment