- Học kỳ
- SP2026
- Thời Gian
- 3/5/26
- Loại tài liệu
- PE
NLP301c SP26 NO1 PE RE
NLP301c PE INSTRUCTIONS
Student preparation:
- Before proceeding, install Python 3.8 or higher.
- Install NLTK (nltk-3.6.2), downloadable from [
You do not have permission to view link Đăng nhập hoặc Đăng ký.
- Run cmd with administrator role, change the current path in prompt to the nltk-3.6.2 folder, and type the following command to set up.
- Once installed, start up the Python interpreter and install the data required for the book by typing:
- >>> import nltk
- >>> nltk.download()
- (Note: The image shows the NLTK Downloader window with the "book" collection installed).
Instructions for completing PE:
- For submission: Create a folder named in the format: YourName_YourRollNumber_NLP301c
- Example: Nguyễn Thế Hùng with roll number SE09234 will submit: HungNT_SE09234_NLP301c
- Create separate files named Q1.py, Q2.py, Q3.py, Q4.py and place them in the folder.
- Submit the folder to the server. Incorrect submissions will receive a ZERO.
Questions
Question 1: (2 marks)
Write a program to process a system log message by removing all timestamps (digits) and returning a clean message.
Requirements:
- Remove all digits from the text.
- Normalize spaces (no extra spaces between words).
- Input: A string representing a system log.
- Desired Output: A cleaned string without digits.
- Example:
- Input: "Error404 occurred at 10:45 on server123"
- Output: "Error occurred at: on server"
Question 2: (2 marks)
Write a program to compute the number of unique bigrams (two-word sequences) in a product description.
- Input: A string containing a product description.
- Desired Output: An integer representing the number of unique bigrams.
- Example:
- Input: "smart phone with smart features and smart design"
- Output: 7
- Explanation: (smart phone, phone with, with smart, smart features, features and, and smart, smart design) → unique = 7.
Question 3: (3 marks)
Write a program to extract all forum tags from a post.
A tag:
- Starts with $
- Followed by letters only (no digits, no punctuation).
- Input: A string containing a forum post.
- Desired Output: A list of valid tags in the order they appear.
- Example:
- Input: "Discussing $AI and $Machine Learning in $2025 trends and $DataScience!"
- Output: ['$AI', '$Machine Learning', '$DataScience']
Question 4: (3 marks)
Write a program that checks whether two chatbot responses are semantically identical in terms of word usage.
Rules:
- Ignore case.
- Ignore punctuation.
- Ignore word order.
- Compare based on word frequency.
- Input: Two strings (two sentences).
- Desired Output: True or False.
- Example 1:
- Sentence 1: "AI will change the world"
- Sentence 2: "the world will change AI"
- Output: True
- Example 2:
- Sentence 1: "AI is powerful"
- Sentence 2: "AI is very powerful"
- Output: False