Python - Gắn thẻ PoS và bổ sung hóa bằng cách sử dụng spaCy

spaCy là một trong những thư viện phân tích văn bản tốt nhất. spaCy vượt trội trong các nhiệm vụ khai thác thông tin quy mô lớn và là một trong những dịch vụ nhanh nhất trên thế giới. Đây cũng là cách tốt nhất để chuẩn bị văn bản cho việc học sâu. spaCy nhanh hơn và chính xác hơn nhiều so với NLTKTagger và TextBlob.

Làm thế nào để cài đặt?

pip install spacy
python -m spacy download en_core_web_sm

Ví dụ

#importing loading the library
import spacy
# python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")
#POS-TAGGING
# Process whole documents
text = ("""My name is Vishesh. I love to work on data science problems. Please check out my github profile!""")
doc = nlp(text)
# Token and Tag
for token in doc:
print(token, token.pos_)
# You want list of Verb tokens
print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])
#Lemmatization : It is a process of grouping together the inflected #forms of a word so they can be analyzed as a single item, #identified by the word’s lemma, or dictionary form.
import spacy
# Load English tokenizer, tagger,
# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = ("""My name is Vishesh. I love to work on data science problems. Please check out my github profile!""")
doc = nlp(text)
for token in doc:
print(token, token.lemma_)