Hong Kong Mid-1990s Newspaper Column Corpus (HKMNCC)
The Education University of Hong Kong
Jointly constructed by:
- David C.S. Li (李楚成, PI)
- Cathy S. P. Wong (黃倩萍, Co-I)
- Wai Mun LEUNG (梁慧敏, Co-I)
Acknowledgement:
*The construction of this corpus and website was supported by a special grant of the Hong Kong Institute of Education. The Institute's support is hereby gratefully acknowledged.
About the HKMNCC corpus:
The HKMNCC corpus (Li, Leung & Wong, 2014) aspires to be a contribution to 'Hong Kong Written Chinese' (HKWC, Shi 2006). It consists of texts appearing in Hong Kong Chinese newspaper columns and infotainment stories in the mid-1990s. As is different from formal sections such as hard news stories and feature articles where Standard Written Chinese (SWC) norms prevail, the writing style in the HKMNCC corpus is typically informal and characterized by adherence to the Cantonese vernacular and the generous use of elements from English. The latter may take the form of unintegrated insertions (Muysken, 2000), i.e., English words in romanized script, resulting in 'code-mixing' (中英夾雜), or integrated lexical borrowings (借詞), e.g., 巴士 (baa1 si2 'bus') and的士 (dik1 si2 'taxi').
Key corpus information:
- Date of data collection: ca. 1994-1995 (by the PI).
- Date of corpus construction: 2013-2014.
- Main sources: 香港經濟日報 (Hong Kong Economic Times), 信報 (Hong Kong Economic Journal), and 明報 (Ming Pao).
- Newspaper genres & style: HK Chinese newspaper columns & infotainment stories, informal.
- Corpus size: about 600,000 Chinese characters.
- Number of texts: ca. 1xxx clippings.
- Mode of sampling: Collected randomly, usually because the clippings contained some elements of English origin.
- How constructed: two student assistants inputted original clippings into Word files, which were finalized after proofreading each other's typed drafts.
- Precursor of writing style: Compare三及第 (saam1 kap6 dai2, "imperial examination's three top honours", Cheung & Bauer, 2002; see C. M. Wong, 2002).
Research outputs to date:
While processing the data, it came to our attention that roughly 1 in 4 to 5 unintegrated insertions is monosyllabic. The preponderance of monosyllabic English words (MEWs) appearing in informal HKWC called for theoretical explanation. Michael Clyne's (2003) notion of 'facilitation of transference' has proved to provide a theoretically adequate explanation for the massive transference of MEWs into informal HKWC, hence the 'Monosyllabic Salience Hypothesis' (MSH). For details, please refer to the following outputs:
- Li, David C.S., Wong, Cathy S.P., Leung, Wai Mun & Wong, Sam T.S. (2016). Facilitation of transference: The case of monosyllabic salience in Hong Kong Cantonese. Linguistics.
- 李楚成﹑梁慧敏﹑黃倩萍﹑黃得森 (in press). 香港粵語「單音節促發論」分析 語言接觸下的新視角. 中國社會語言學 (Chinese Journal of Sociolinguistics).
References cited:
Cheung, Kwan-Hin & Bauer, Robert S. (2002). The representation of Cantonese with Chinese characters. Monograph Series No. 18. Berkeley, CA: Journal of Chinese Linguistics.
Clyne, Michael. (2003). Dynamics of language contact. Cambridge: Cambridge University Press.
Muysken, Pieter. (2000). Bilingual speech: A typology of code-mixing. Cambridge: Cambridge University Press.
Shi, Dingxu. (2006). Hong Kong Written Chinese: Language change induced by language contact. Journal of Asian Pacific Communication 16(2). 299–318.
Wong, Chung-ming [黄仲鳴]. (2002). 香港三及第文體流變史 ('A history of the saam-kap-dai genre in Hong Kong'). Hong Kong: Hong Kong Writers' Association.
This website may be cited as:
Li, David C. S., Leung, Wai-Mun & Wong, Cathy Sin-Ping. (2014). Hong Kong Mid-1990s Newspaper Column Corpus (HKMNCC, 香港1990年代中期報章副刊語料庫). Hong Kong: The Education University of Hong Kong.