Ideogram sensitive PostTeaser
睡到一半口渴睡不著,所以就來裝裝 wordpress plugins,找到一個 PostTeaser 看起來還不錯,所以就裝起來試試看。PostTeaser 是一個 excerpt 的加強版,其簡介是說:
Post Teaser generates a preview or "teaser" of a post for the main, archive and category pages, with a link underneath to go to the full post page. It includes features to generate a word count, image count, and an estimated reading time.
可惜,裝起來後發現,他的 word count 對中文字感冒,連帶地也算不準 excerpt 的長度。如果是中文字比較多的文章,很容易就帶出一大票文字,喪失了使用 excerpt 的原意。查了一下程式碼,發現是這一個 function 的問題:
/*** Counts words. PHP's str_word_count() only works for alphabetic characters ***/
function word_count($text) {
$text = strip_tags($text);
$text = preg_split("/\s+/", $text);
$count = count($text);
return $count;
}
很顯然地,又是一個外國人不理解表意文字 (ideogram) 特性的例子,直接以「空白」當作 word seperator 切開取 word count,這對中文字來說,當然不太對。比較好的做法,應該是碰到英文字時,就用空白切 word,碰到中文字時,就以 character 為單位切 word[1]。查了一下 PHP 的 multi-byte string functions,發覺其實很簡單,用 mb_strwidth() 減去 mb_strlen(),就是少算的中文字個數了:
SHELL> svn info
Path: .
URL: http://svn.wp-plugins.org/post-teaser/trunk
Repository Root: http://svn.wp-plugins.org
Repository UUID: b8457f37-d9ea-0310-8a92-e5e31aec5664
Revision: 5990
Node Kind: directory
Schedule: normal
Last Changed Author: turnip
Last Changed Rev: 5635
Last Changed Date: 2006-03-24 02:31:41 +0800 (Fri, 24 Mar 2006)
SHELL> svn diff
Index: post-teaser.php
===================================================================
--- post-teaser.php (revision 5990)
+++ post-teaser.php (working copy)
@@ -363,6 +363,12 @@
$text = strip_tags($text);
$text = preg_split("/\s+/", $text);
$count = count($text);
+ if (function_exists('mb_strwidth') && function_exists('mb_strlen')) {
+ while ($t = each($text)) {
+ $t = $t['value'];
+ $count += (mb_strwidth($t, 'UTF-8') - mb_strlen($t, 'UTF-8'));
+ }
+ }
return $count;
}
喔,好吧,我承認,應該是「差不多」就是少算的中文字個數了,如果碰到中英混雜的 token 的話。不過因為其實沒有必要做到完全精確,所以差不多就好。
- 嚴謹來說,這樣的說法也不太對,因為中文字 (character) 不一定組成一個詞 (word)。不過,實務上這樣的做法,才是可以達到我們所希望的效果。 ↩



2 Backlinks
»Ideogram sensitive PostTeaser
MyKazaam Networks - Dolphin Safe Programming (tm)PHP MySQL TutorialEducation / Zend.comdompdf - The PHP 5 HTML to PDF ConverterInstall & Configure Apache, PHP, JSP, Ruby on Rails, MySQL, PHPMyAdmin & WordPress on Windows XP/2000Ideogram sensitive PostTeaserLocalised Social Networking script | Ask MetaFilterCheat Sheet Roundup - Over 30 Cheatsheets for developersFive common Web application vulnerabilitiestry ruby! (in your browser)
Post a Comment