WebbThe traditional text summarization method usually bases on extracted sentences approach [1], [9]. Summary is made up of the sentences were selected from the original. Therefore, in the meaning and content of the text summaries are usually sporadic, as a result, text summarization lack of coherent and concise. Webb11 nov. 2010 · This paper proposes an automatic method to generate an extractive summary of multiple Vietnamese documents which are related to a common topic by modeling text documents as weighted undirected graphs. It initially builds undirected graphs with vertices representing the sentences of documents and edges indicate the …
VnCoreNLP: A Vietnamese natural language processing toolkit
WebbConstruct a PhoBERT tokenizer. Based on Byte-Pair-Encoding. This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to … WebbThere are two types of summarization: abstractive and extractive summarization. Abstractive summarization basically means rewriting key points while extractive summarization generates summary by copying directly the most important spans/sentences from a document. bitcryptys
Improving Quality of Vietnamese Text Summarization
Webb31 aug. 2024 · Recent researches have demonstrated that BERT shows potential in a wide range of natural language processing tasks. It is adopted as an encoder for many state-of-the-art automatic summarizing systems, which achieve excellent performance. However, so far, there is not much work done for Vietnamese. Webb17 sep. 2024 · The experiment results show that the proposed PhoBERT-CNN model outperforms SOTA methods and achieves an F1-score of 67.46% and 98.45% on two benchmark datasets, ViHSD and ... In this section, we summarize the Vietnamese HSD task [9, 10]. This task aims to detect whether a comment on social media is HATE, … WebbConstruct a PhoBERT tokenizer. Based on Byte-Pair-Encoding. This tokenizer inherits from PreTrainedTokenizer which contains most of the main methods. Users should refer to this superclass for more information regarding those methods. Parameters vocab_file ( str) – Path to the vocabulary file. merges_file ( str) – Path to the merges file. bit-cryptorising