
Azure HDInsight with Spark and Spark MLlib. In Azure, the following services provide natural language processing (NLP) capabilities: What are your options when choosing an NLP service? For example, think of a text representation of an invoice-it can be difficult to build a process that correctly extracts the invoice number and invoice date for invoices across any number of vendors. Without a standardized document format, it can be difficult to achieve consistently accurate results using free-form text processing to extract specific facts from a document. Processing a collection of free-form text documents is typically computationally resource intensive, as well as being time intensive. When using NLP to extract information and insight from free-form text, the starting point is typically the raw documents stored in object storage such as Azure Storage or Azure Data Lake Store. Detecting complete sentences within paragraphs of text. Identifying text as a verb, noun, participle, verb phrase, and so on. For example, "running" and "ran" map to "run." Normalizing words so that different forms map to the canonical word with the same meaning. Splitting the text into words or phrases. These approaches use many techniques from natural language processing, such as:
Another use for NLP is to score text for sentiment, to assess the positive or negative tone of a document. The detected topics may be used to categorize the documents for navigation, or to enumerate related documents given a selected topic. Entities might be combined into topics, with summaries that describe the important topics present in each document. These entities can also be used to tag documents with keywords, which enables search and retrieval based on content. Another use for NLP is to summarize text by identifying the entities present in the document. The output of NLP can be used for subsequent processing or search. NLP can be use to classify documents, such as labeling documents as sensitive or spam. Natural language processing (NLP) is used for tasks such as sentiment analysis, topic detection, language detection, key phrase extraction, and document categorization.