N
NLP Practice
Practice
Exam
Browse
Start practising
Browse questions
Search all 200 MCQs. Click any card to reveal the answer.
Topic:
All
Introduction
NLP Pipeline
Text Representation
Difficulty:
All
medium
hard
Type:
All
application
conceptual
calculation
200 questions
Introduction
application
medium
A product team is building a voice assistant that accepts ordinary spoken requests instead of commands written in a programming language. What is the central NLP problem being addressed?
+
Introduction
application
medium
An email client detects meeting details inside a message and creates a calendar entry automatically. Which NLP task is most central to this feature?
+
Introduction
application
medium
A bank wants to route customer complaints into categories such as card issue, loan issue, and account issue. Which NLP task best fits this requirement?
+
Introduction
application
medium
A legal search system returns the most relevant case documents for a lawyer's query. Which NLP task is mainly involved?
+
Introduction
application
medium
A support bot receives the question, 'How do I reset my password?' and returns a direct answer. What task is most directly represented?
+
Introduction
application
medium
A travel app converts a hotel review from Sinhala into English. Which NLP task is being used?
+
Introduction
application
medium
A researcher has thousands of unlabeled abstracts and wants to discover their hidden research themes. Which task is most suitable?
+
Introduction
conceptual
medium
Why is open-domain conversation generally harder than keyword search?
+
Introduction
conceptual
medium
A system can tokenize text but fails to interpret meaning in context. Which limitation does this reveal?
+
Introduction
application
medium
A speech-to-text engine struggles mainly with distinguishing small sound units in spoken language. Which building block of language is most relevant?
+
Introduction
application
medium
A parser identifies that 'the old bridge' is a noun phrase inside a sentence. Which language level is being analyzed?
+
Introduction
application
medium
A home assistant hears 'Can you crack a window?' and opens the window rather than damaging it. Which kind of interpretation is most important?
+
Introduction
conceptual
medium
The sentence 'I made her duck' creates difficulty for an NLP system because it has several possible meanings. What challenge does this illustrate?
+
Introduction
conceptual
medium
A model judges 'dog bit man' as more plausible than 'man bit dog' even when both are grammatically possible. What knowledge is being used?
+
Introduction
conceptual
medium
Why can an NLP method built for English fail when moved directly to Sinhala or Japanese?
+
Introduction
conceptual
medium
A student claims deep learning, machine learning, and NLP are unrelated fields. What is the best correction?
+
Introduction
application
medium
A team has only 50 labeled examples and strong domain rules for identifying spam. What is the most sensible initial approach?
+
Introduction
conceptual
medium
Why can context-free grammars be useful where simple regular expressions are insufficient?
+
Introduction
application
medium
A news classifier uses word counts and assumes each feature contributes independently to the class probability. Which model is most aligned with this assumption?
+
Introduction
application
medium
A text classifier must separate categories with a robust decision boundary and the dataset is moderate in size. Which classical model is a reasonable candidate?
+
Introduction
application
medium
A POS tagger treats grammatical categories as hidden states that generate observed words. Which model family matches this framing?
+
Introduction
application
medium
A sequence model labels each word in a sentence as a POS tag while considering neighboring labels. Which model is a natural classical choice?
+
Introduction
conceptual
medium
Why are RNNs naturally suited for many language tasks?
+
Introduction
application
medium
A CNN text classifier looks at groups of neighboring words such as 'very good movie'. What property makes CNNs useful here?
+
Introduction
conceptual
medium
What makes transformers different from traditional sequential RNN processing?
+
Introduction
application
medium
A small classification task fine-tunes BERT rather than training a language model from scratch. Which idea is being applied?
+
Introduction
conceptual
medium
An autoencoder is used to convert text into a compressed dense vector before another task. What is the main purpose of the hidden representation?
+
Introduction
application
medium
A company has a few hundred labeled reviews. A huge deep model fits the training set but performs poorly on new reviews. Which risk is most likely?
+
Introduction
application
medium
A model trained on news articles performs poorly on informal social-media text. Which limitation is most relevant?
+
Introduction
application
medium
A tourist translation device must work offline with low memory and low power. Which concern argues against a large cloud-style DL model?
+
Introduction
application
medium
A voice agent converts a user's speech into text before understanding the request. Which component performs this first conversion?
+
Introduction
application
medium
A voice agent turns the text 'The lights have been dimmed' into spoken audio for the user. Which component is responsible?
+
Introduction
application
medium
A conversational agent extracts 'Colombo' as a location and 'tomorrow' as a date from a user's request. Which NLU subtask is most involved?
+
Introduction
application
medium
A chatbot understands that 'he' and 'Dr. Silva' refer to the same person in a conversation. Which task is being performed?
+
Introduction
application
medium
A user says 'Play Mozart songs' and the system decides this is a music command rather than a weather question. Which component is mainly involved?
+
Introduction
application
medium
After identifying a user's intent, a voice assistant chooses whether to answer, play music, dim lights, or ask for clarification. What stage is this?
+
Introduction
application
medium
A single application classifies support tickets, retrieves related FAQ pages, and generates a short answer. What is the best interpretation?
+
Introduction
application
medium
A startup wants the fastest reliable baseline for a narrow text-classification task. What is the best modelling mindset?
+
Introduction
application
medium
A model must classify documents and the engineer values a quick baseline with interpretable word-count evidence. Which pair is most reasonable?
+
Introduction
conceptual
medium
Why might both CNNs and transformers improve over simple BoW for sentiment analysis?
+
Introduction
application
hard
An NLP project has limited labels, strict latency, and a need for explanations. Which option best follows the lecture's practical warning about DL?
+
Introduction
application
medium
A document parser must extract email IDs and dates reliably from forms. Which approach is a sensible early component?
+
Introduction
application
medium
A knowledge base records that a wheel is part of a car. Which WordNet-style relation is closest?
+
Introduction
application
medium
A taxonomy records that a sparrow is a type of bird. Which relation is closest?
+
Introduction
application
medium
A supervised spam detector learns from emails paired with spam/non-spam labels. What learning paradigm is this?
+
Introduction
application
medium
A clustering system groups customer reviews into hidden themes without labels. What learning paradigm is this?
+
Introduction
application
medium
A dialog policy improves based on rewards for successful task completion. Which learning paradigm is closest?
+
Introduction
conceptual
medium
What is the difference between semantics and pragmatics in an NLP interpretation task?
+
Introduction
conceptual
medium
Why is a language-agnostic NLP solution difficult to build?
+
Introduction
application
medium
A deep model performs well in development but fails in production because the training set was too small. Which project decision should be reconsidered?
+
Introduction
conceptual
medium
Why do common sense and world knowledge remain difficult in NLP?
+
Introduction
application
medium
A voice assistant pipeline receives audio, transcribes it, extracts intent, chooses an action, and speaks back. What does this illustrate?
+
Introduction
application
medium
A social media monitoring tool identifies people, organizations, and locations mentioned in posts. Which task is most appropriate?
+
Introduction
application
hard
A pretrained transformer trained mostly on formal news is fine-tuned for slang-heavy customer chats and underperforms. Which explanation is most likely?
+
Introduction
conceptual
medium
Why should an NLP engineer not choose a method only because it is newest?
+
Introduction
application
medium
An exam platform scores written answers and checks plagiarism. Why is NLP relevant?
+
Introduction
application
medium
A search system answers factual questions using a large structured knowledge base. Which resource type is being used?
+
Introduction
conceptual
hard
Which statement best captures the lecture's balanced view of deep learning for NLP?
+
Introduction
application
medium
A chatbot retrieves a known answer from a knowledge base and fills a response template. Which final stage is mainly involved?
+
Introduction
application
medium
A project has labeled examples, converts documents into features, then learns a classifier. Which high-level approach is this?
+
Introduction
application
medium
A domain expert writes rules to identify job titles in offer letters before enough data exists. Which approach is being used?
+
Introduction
application
medium
In sentiment classification, why might a CNN look at 'not good' as a phrase rather than separate words?
+
Introduction
conceptual
medium
Why can long text be problematic for a basic RNN?
+
Introduction
application
medium
A user says, 'Book a taxi to the airport tomorrow morning.' The system extracts the destination and time, then decides the booking intent. Which combination is being used?
+
Introduction
conceptual
medium
Why is choosing an NLP approach task-dependent?
+
NLP Pipeline
conceptual
medium
Why is an NLP pipeline useful in a real project?
+
NLP Pipeline
application
medium
A team collects raw reviews, removes HTML, tokenizes text, builds features, trains a model, evaluates it, deploys it, and watches performance. What does this sequence represent?
+
NLP Pipeline
application
medium
A team starts with a public review dataset, adds a small labeled custom set, and augments examples. Why is this reasonable?
+
NLP Pipeline
application
medium
A team scrapes forum posts for complaints. Which risk must be considered before using the data?
+
NLP Pipeline
application
medium
A product team adds a feedback button so users can label incorrect recommendations. Which data acquisition method does this resemble?
+
NLP Pipeline
application
medium
An English sentence is translated to German and then back to English to create a slightly varied training example. What is this called?
+
NLP Pipeline
conceptual
medium
Why might TF-IDF-based word replacement be safer than random replacement in augmentation?
+
NLP Pipeline
application
medium
A sentence 'I am going to the supermarket' is changed by flipping the bigram 'going to' into 'to going'. What is the likely purpose?
+
NLP Pipeline
application
medium
A team uses Snorkel-style labeling functions to create weak labels automatically. What problem is this addressing?
+
NLP Pipeline
application
medium
A crawler extracts website text but the output contains navigation menus and JavaScript. What should the team do?
+
NLP Pipeline
application
medium
A multilingual social-media pipeline crashes because emojis and symbols are encoded inconsistently. Which cleanup step is most relevant?
+
NLP Pipeline
application
medium
A PDF parser returns text in the wrong order and misses table structure. Why is this a pipeline concern?
+
NLP Pipeline
application
medium
An OCR system reads a scanned document but produces spelling errors because the scan quality is low. What is a reasonable next step?
+
NLP Pipeline
application
medium
A voice-based customer service system uses ASR output as its text input. What issue should the team expect?
+
NLP Pipeline
application
medium
A naive sentence splitter breaks 'Dr. Perera arrived. He spoke.' after 'Dr.' What problem does this show?
+
NLP Pipeline
application
medium
A tokenizer splits O'Neil into separate pieces and treats '$10,000' differently from '€1000'. What lesson follows?
+
NLP Pipeline
application
medium
A system should treat 'N.Y.' as one token rather than splitting at every punctuation mark. What should be added?
+
NLP Pipeline
application
medium
A news classification model removes words like 'the', 'of', and 'in'. What is the intended effect?
+
NLP Pipeline
conceptual
medium
Why is there no universal stop-word list suitable for every NLP task?
+
NLP Pipeline
application
medium
A classifier treats 'Apple' and 'apple' as the same token for product reviews. What preprocessing step is being used?
+
NLP Pipeline
application
medium
A system maps 'better' to 'good' by considering linguistic information. Which technique is being used?
+
NLP Pipeline
conceptual
medium
Why can stemming be risky for tasks requiring linguistically valid base forms?
+
NLP Pipeline
conceptual
hard
Why should lemmatization usually happen before removing tokens or lowercasing in some pipelines?
+
NLP Pipeline
application
medium
A crawler gathers reviews in English, Sinhala, and Tamil. What should likely happen early in the pipeline?
+
NLP Pipeline
application
medium
A user writes Sinhala words using Roman letters mixed with English words. Which preprocessing issues are present?
+
NLP Pipeline
application
medium
A system must identify that 'Satya Nadella' is related to 'Microsoft' through the CEO relation. Which advanced steps are most relevant?
+
NLP Pipeline
conceptual
medium
Why might stop-word removal be bad for calendar-event extraction from emails?
+
NLP Pipeline
conceptual
hard
Why should POS tagging not usually be preceded by heavy lowercasing, stop-word removal, or token deletion?
+
NLP Pipeline
application
medium
After preprocessing, a team converts text into numeric vectors for a classifier. Which pipeline stage is this?
+
NLP Pipeline
conceptual
medium
What is a tradeoff of deep-learning pipelines that learn features automatically?
+
NLP Pipeline
application
medium
An email system uses a blacklist of domains before enough labeled data exists. What role do such rules play?
+
NLP Pipeline
application
medium
Several spam heuristics are individually deterministic but jointly fuzzy. What is a good way to use them with ML?
+
NLP Pipeline
application
medium
A rule detects phrases that indicate spam with 99% confidence. How should it be used?
+
NLP Pipeline
application
medium
A spam system combines a heuristic score, Naive Bayes output, and LSTM output, then feeds them to logistic regression. What technique is this?
+
NLP Pipeline
conceptual
medium
How is an ensemble different from stacking in the lecture's framing?
+
NLP Pipeline
application
medium
A small dataset task starts from BERT rather than learning representations from scratch. Which improvement strategy is this?
+
NLP Pipeline
conceptual
medium
What does model performance on unseen data primarily measure during evaluation?
+
NLP Pipeline
application
medium
A spam classifier reports precision, recall, F1, and a confusion matrix on a labeled test set. What type of evaluation is this?
+
NLP Pipeline
application
medium
A spam filter has high F1 but users still waste time checking wrongly placed emails. Which evaluation is revealing the problem?
+
NLP Pipeline
application
medium
A machine translation output says 'I ate three hazelnuts' while the reference says 'I ate three filberts'. Automatic metrics mark it wrong although meaning is close. What limitation is shown?
+
NLP Pipeline
conceptual
medium
Why do teams usually perform intrinsic evaluation before extrinsic evaluation?
+
NLP Pipeline
conceptual
hard
A model does well intrinsically but badly extrinsically. What is a likely explanation?
+
NLP Pipeline
application
medium
After launch, a dashboard tracks prediction distributions and performance over time. Which stage is this?
+
NLP Pipeline
application
medium
A model's accuracy drops after user behavior changes, so the team retrains or updates it. Which stage is this?
+
NLP Pipeline
application
medium
A pipeline designed for English fails on a language with complex morphology. What should the team do?
+
NLP Pipeline
application
medium
In the COTA pipeline, ticket text is cleaned, tokenized, lowercased, stop-word removed, and lemmatized. What representation follows?
+
NLP Pipeline
application
medium
COTA uses TF-IDF and LSI to map ticket text into a topic space. Which stage is this?
+
NLP Pipeline
application
medium
COTA creates a vector where each element is ticket similarity to one possible solution. What measure is used?
+
NLP Pipeline
application
medium
COTA uses MRR and also measures overall effectiveness in production. What does this combine?
+
NLP Pipeline
conceptual
medium
Why does the generic pipeline include feedback from evaluation to preprocessing or data acquisition?
+
NLP Pipeline
application
medium
A Sri Lankan social-media sentiment project contains Sinhala, English, and Sinhala written in Roman script. Which preprocessing plan is most appropriate?
+
NLP Pipeline
application
medium
A mature NLP system keeps both ML models and high-confidence rules. What is the best interpretation?
+
NLP Pipeline
application
hard
A deployed classifier's test-set score was high, but after three months the live input distribution changes and errors rise. Which pipeline stages are now most important?
+
NLP Pipeline
conceptual
medium
Why should preprocessing be chosen based on the application rather than applied blindly?
+
NLP Pipeline
application
medium
A document processing system handles HTML pages, PDFs, and scanned images. What stage should focus on converting each source into usable text?
+
NLP Pipeline
conceptual
medium
Why are standard classification metrics not enough for all NLP tasks?
+
NLP Pipeline
application
medium
A rule catches a legally required phrase that appears rarely in training data. Why keep this heuristic in an ML pipeline?
+
NLP Pipeline
application
medium
A system must split text into meaningful word units before vectorization. Which step is required?
+
NLP Pipeline
application
medium
A product team defines success as 'users spend less time handling spam' rather than only high F1. Which evaluation view is this?
+
NLP Pipeline
application
medium
A dataset is too small for a custom scenario, so the team uses synonym replacement, back translation, and weak-labeling tools. What is the goal?
+
NLP Pipeline
application
medium
A task is common enough that cloud NLP services already solve it reasonably. What should a practical team do first?
+
NLP Pipeline
application
medium
A text analysis task cares about dictionary-correct base forms because output will be shown to linguists. Which normalization is preferable?
+
NLP Pipeline
application
medium
A calendar extractor fails after removing 'at', 'on', and 'from'. What caused the failure?
+
NLP Pipeline
application
medium
A QA model's exact-match score is low because many correct answers are paraphrases. What should be considered?
+
NLP Pipeline
conceptual
medium
Why is it sensible to start simple and add complexity over time?
+
Text Representation
conceptual
medium
Why is text representation necessary in NLP?
+
Text Representation
conceptual
medium
Why can poor text features lead to poor NLP performance?
+
Text Representation
conceptual
medium
Two documents are represented as vectors and compared by the angle between them. Which similarity measure is being used?
+
Text Representation
calculation
medium
If two non-zero document vectors point in exactly the same direction, what is their cosine similarity?
+
Text Representation
conceptual
medium
What is the core idea of a vector space model for text?
+
Text Representation
application
medium
A vocabulary has 20,000 unique terms. In one-hot encoding, how long is each word vector?
+
Text Representation
application
medium
Why can one-hot encoding be inefficient for large real-world corpora?
+
Text Representation
application
medium
A one-hot-based model trained on {dog, bites, man} receives the word 'fruit' at runtime. What problem appears?
+
Text Representation
application
medium
A document is represented by counting how many times each vocabulary word appears. Which representation is this?
+
Text Representation
application
medium
Why do 'Dog bites man' and 'Man bites dog' get the same BoW representation in the toy example?
+
Text Representation
application
medium
A sentiment model only needs to know whether certain positive or negative words occur, not how often. Which option is most appropriate?
+
Text Representation
calculation
medium
In an n-gram model, what is n=2 commonly called?
+
Text Representation
calculation
medium
In an n-gram model, what is n=3 commonly called?
+
Text Representation
conceptual
medium
What is the main tradeoff when increasing n in a bag-of-n-grams model?
+
Text Representation
application
medium
A word appears often in one document but rarely across the corpus. What should happen to its TF-IDF score?
+
Text Representation
conceptual
medium
Why does IDF downweight words such as 'is', 'are', and 'am'?
+
Text Representation
conceptual
medium
What shared drawback affects BoW, BoN, and TF-IDF?
+
Text Representation
conceptual
medium
What problem do distributed representations try to solve?
+
Text Representation
application
medium
In 'NLP rocks', the word 'rocks' means something positive rather than stones. Which idea explains this?
+
Text Representation
conceptual
medium
How is a distributed representation different from a distributional sparse representation?
+
Text Representation
conceptual
medium
What is an embedding in this lecture's terminology?
+
Text Representation
conceptual
medium
Why are Word2vec vectors more useful than one-hot vectors for many ML tasks?
+
Text Representation
application
medium
The relationship 'king - man + woman ≈ queen' is used to show what property?
+
Text Representation
conceptual
medium
Why should a team often start with pre-trained embeddings if they suit the project?
+
Text Representation
application
medium
A call to most_similar('beautiful') returns ranked words with scores. What do higher scores indicate?
+
Text Representation
application
medium
A model predicts surrounding context words from a center word. Which Word2vec architecture is this?
+
Text Representation
calculation
medium
For CBOW with window parameter k=2, how many context words surround a center word in the middle of a sentence?
+
Text Representation
calculation
medium
For SkipGram with k=2, how many target context-word training pairs are produced for one center word away from boundaries?
+
Text Representation
application
medium
A corpus contains many unique phone numbers that are not meaningful individually. What preprocessing is sensible before training embeddings?
+
Text Representation
calculation
medium
Given vocabulary {am, because, happy, I, learning}, what is the length of each one-hot word vector?
+
Text Representation
conceptual
medium
Why are softmax and cross-entropy used in Word2vec training?
+
Text Representation
conceptual
hard
Why are embeddings considered by-products of CBOW training rather than direct labels?
+
Text Representation
application
medium
A team tests embeddings using analogies such as 'France:Paris :: Italy: ?'. What evaluation type is this?
+
Text Representation
conceptual
medium
Why can extrinsic evaluation of embeddings be hard to troubleshoot?
+
Text Representation
application
medium
A developer sets vector dimension, context window, min_count, workers, and sg in gensim Word2Vec. What are these settings?
+
Text Representation
application
medium
A document vector is built by averaging all word embeddings in the document. What is the advantage and limitation?
+
Text Representation
conceptual
medium
Why can OOV words harm a production NLP model?
+
Text Representation
application
medium
A system builds a vector for an unseen word by combining character n-gram vectors. Which model family supports this?
+
Text Representation
conceptual
medium
Why is fastText more robust to some OOV problems than Word2vec?
+
Text Representation
application
medium
A team reduces 300-dimensional word vectors to two dimensions to visually inspect clusters. Which technique is appropriate?
+
Text Representation
application
medium
A grammar-correction product combines embeddings with rules about common grammar mistakes. What representation strategy is this?
+
Text Representation
conceptual
medium
When should handcrafted features still be considered?
+
Text Representation
conceptual
hard
Why might embeddings outperform BoW on documents using synonyms?
+
Text Representation
application
medium
A classifier must distinguish 'not satisfied' from 'satisfied'. Which representation is better than unigram BoW for capturing this local phrase?
+
Text Representation
conceptual
medium
Why does high dimensionality hamper learning in sparse text vectors?
+
Text Representation
application
medium
A very long document contains a term 10 times and a short document contains it 3 times. Why normalize term frequency by document length?
+
Text Representation
conceptual
medium
Why can TF-IDF scores differ between a manual formula and scikit-learn output?
+
Text Representation
conceptual
hard
Why can embedding analogies be impressive but not a complete guarantee of task success?
+
Text Representation
application
medium
A language model assigns high probability to 'The cat jumped over the dog' and low probability to 'jumped over the the cat dog'. What is it judging?
+
Text Representation
application
medium
Before training Word2vec on tweets, the team converts repeated punctuation and preserves meaningful hashtags. Why?
+
Text Representation
calculation
medium
For vocabulary {am, because, happy, I, learning}, the context vector averages one-hot vectors for {I, am, because, I}. What value appears in the 'I' dimension?
+
Text Representation
conceptual
medium
If W1 has shape N x v in a CBOW model, why can its columns be used as word embeddings?
+
Text Representation
conceptual
medium
Why is extrinsic evaluation described as the ultimate test for embeddings?
+
Text Representation
application
medium
A long review vector is made by summing or averaging word vectors. What major information may be lost?
+
Text Representation
application
medium
A t-SNE visualization shows country names clustering together. How should it be interpreted?
+
Text Representation
conceptual
medium
Why is cosine similarity common for text vectors?
+
Text Representation
conceptual
hard
Which statement best compares TF-IDF and Word2vec?
+
Text Representation
application
medium
A domain has enough specialized text and many important terms absent from public embeddings. What should the team consider?
+
Text Representation
conceptual
medium
Why do many industrial systems use hybrid features?
+
Text Representation
calculation
medium
Using vocabulary [dog, bites, man, meat, food, eats], what is the BoW vector for 'Dog bites man' after lowercasing?
+
Text Representation
calculation
medium
Given bigram vocabulary [dog bites, bites man, man bites, bites dog], what is the vector for 'Dog bites man'?
+
Text Representation
application
medium
A term occurs in every document of a corpus. How should IDF generally treat it?
+
Text Representation
application
medium
A representation has 300 dimensions and most values are non-zero. Which type is this most likely?
+
Text Representation
application
medium
A grammar checker uses embeddings for context but also explicit rules for subject-verb agreement. Why is this reasonable?
+
Text Representation
application
medium
A data scientist wants to inspect whether embeddings group synonyms, countries, and verbs separately. What is an appropriate exploratory tool?
+
Text Representation
conceptual
medium
Why do n-grams not fully solve the limitations of BoW?
+
Text Representation
conceptual
medium
Why does the context window hyperparameter matter in Word2vec?
+
Text Representation
application
medium
A very large corpus cannot fit into memory as a list of token lists. Which gensim input strategy is more suitable?
+
Text Representation
application
medium
A hospital NLP system uses general web embeddings and fails on clinical abbreviations. What is the most likely representation issue?
+
Text Representation
conceptual
medium
What is the overall practical choice among vectorization, embeddings, and handcrafted features?
+