| Title: | Chatting with Nature Journals Current Issue using a local Language Model |
|---|---|
| Description: | The goal of NatChat is to provide fast, local-language-model-powered summaries of articles from the current issues of Nature journals, making cutting-edge science more accessible and digestible. The package includes functions to identify available journals, retrieve articles from the latest issues, construct prompts for summarization and generate natural-language summaries using large language models (LLMs) via the 'ollama' interface. Output can be formatted for use in markdown tables, reports, or summaries. This tool is particularly useful for researchers, educators, and clinicians who want to stay up to date with the latest literature across multiple disciplines. |
| Authors: | Monah Abou Alezz [aut, cre, cph] (ORCID: <https://orcid.org/0000-0002-2006-4250>) |
| Maintainer: | Monah Abou Alezz <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 1.1.0 |
| Built: | 2026-06-06 07:51:04 UTC |
| Source: | https://github.com/monahton/NatChat |
Adds a prompt column to a data frame of scientific articles, suitable for use with a language model summarization tool.
Prompts are generated using the build_prompt() function, based on article titles and abstracts.
add_prompt(article, ...)add_prompt(article, ...)
article |
A data frame or tibble containing at least the columns |
... |
Additional arguments passed to |
The function checks for the presence of required columns before proceeding. It applies build_prompt() row-wise to generate summarization prompts.
This function is typically used after retrieving articles via get_articles() or get_article(), to prepare data for summarization by a language model (e.g., using ollama::generate()).
A modified data frame of class article_prompt, including an additional column "prompt" containing structured summarization prompts.
## Not run: papers <- get_articles("Nature Medicine") papers_with_prompts <- add_prompt(papers, nsentences = 3) cat(papers_with_prompts$prompt[1]) ## End(Not run)## Not run: papers <- get_articles("Nature Medicine") papers_with_prompts <- add_prompt(papers, nsentences = 3) cat(papers_with_prompts$prompt[1]) ## End(Not run)
This function generates concise summaries for each article in a given data frame, using the specified language model (LLM). The summaries are generated based on the prompts previously added to the data frame.
add_summary(article, model = "llama3.1", host = NULL)add_summary(article, model = "llama3.1", host = NULL)
article |
A data frame or tibble containing at least a |
model |
Character string. The name of the LLM model to use for generating summaries. Default is |
host |
Character string or NULL. The host to be used for the |
The function iterates over each article and generates a summary using the specified LLM model. A progress bar is shown to track the summarization process. Any newlines within the text fields are removed to ensure clean formatting. This function is typically used after applying add_prompt() to prepare a dataset for summarization.
The progress bar updates for each article as the summaries are being generated. The final summary column will contain the output of the summarization process, ready for further processing or analysis.
A modified data frame of class article_summary, including an additional column "summary" containing the generated summaries.
## Not run: papers <- get_article(journal = "Nature Medicine") papers_with_prompts <- add_prompt(papers, nsentences = 3) summarized_papers <- add_summary(papers_with_prompts) ## End(Not run)## Not run: papers <- get_article(journal = "Nature Medicine") papers_with_prompts <- add_prompt(papers, nsentences = 3) summarized_papers <- add_summary(papers_with_prompts) ## End(Not run)
Builds a structured prompt from an article's title and abstract, designed for input to a language model. The prompt emphasizes extracting key findings, methodology, and tone, and is customizable via instructions.
build_prompt( title, abstract, nsentences = 3L, instructions = c("You will receive a paper's title and abstract as input.", "Provide a concise summary with exactly the number of sentences specified.", "Do not include introductory phrases or preamble text.", "Start directly with the summary; avoid any framing statements.", "Focus on key findings, especially of last two sentences of the abstract.", "If the abstract is missing, reply explicitly with 'Abstract is not available.'", "Highlight any novel contributions, claims, or innovations in the abstract.", "Mention main methods or datasets only if explicitly stated in the abstract.", "Indicate the strength and tone of evidence.", "Optionally,add a one-sentence lay summary for a non-specialist audience.") )build_prompt( title, abstract, nsentences = 3L, instructions = c("You will receive a paper's title and abstract as input.", "Provide a concise summary with exactly the number of sentences specified.", "Do not include introductory phrases or preamble text.", "Start directly with the summary; avoid any framing statements.", "Focus on key findings, especially of last two sentences of the abstract.", "If the abstract is missing, reply explicitly with 'Abstract is not available.'", "Highlight any novel contributions, claims, or innovations in the abstract.", "Mention main methods or datasets only if explicitly stated in the abstract.", "Indicate the strength and tone of evidence.", "Optionally,add a one-sentence lay summary for a non-specialist audience.") )
title |
Character string. The title of the article. |
abstract |
Character string. The abstract of the article. If unavailable, include a default message. |
nsentences |
Integer. The number of sentences required in the summary. Default is 3. Must be a positive whole number. |
instructions |
Character vector. A set of instructions guiding the summarization. Defaults to a structured template emphasizing main findings, methods, novelty, and tone. |
The generated prompt follows a structured format:
Lists the instructions (customizable via instructions).
States the number of summary sentences required (nsentences).
Embeds the article title and abstract.
If the abstract is missing or not available, the prompt explicitly states this.
The default instructions vector can be modified to adapt the tone or focus of the summary, such as prioritizing method, dataset, confidence tone, or accessibility for non-specialists.
A character string representing a structured prompt for use with a language model summarization tool.
title <- "Deep Learning for Genomic Data Analysis" abstract <- "This study explores deep learning in diverse tasks highlighting predictive accuracy." prompt <- build_prompt(title, abstract, nsentences = 3) cat(prompt)title <- "Deep Learning for Genomic Data Analysis" abstract <- "This study explores deep learning in diverse tasks highlighting predictive accuracy." prompt <- build_prompt(title, abstract, nsentences = 3) cat(prompt)
Verify whether the Ollama backend is properly installed and running by testing the connection. If successful, retrieve and print the list of available local models.
check_ollama(verbose = TRUE)check_ollama(verbose = TRUE)
verbose |
Logical. Should informative messages and the list of available models be printed to the console? Default is TRUE. |
The function calls ollamar::test_connection() to verify the Ollama service is running,
then calls ollamar::list_models() to check for installed local models.
If verbose, it prints detailed diagnostic messages and the model names.
Logical TRUE if Ollama is installed, running, and at least one model is available; otherwise FALSE.
## Not run: check_ollama() ## End(Not run)## Not run: check_ollama() ## End(Not run)
This function filters a data frame of articles, retaining only those that contain at least one of the specified whitelist terms in either the title or abstract. This allows for easy extraction of articles relevant to a set of predefined topics.
filter_articles(article, whitelist_terms)filter_articles(article, whitelist_terms)
article |
A data frame or tibble containing at least the |
whitelist_terms |
A character vector of terms that are used to filter articles by matching the title or abstract. |
The function combines the "title" and "abstract" columns into a single text string and uses regular expression matching to search for the presence of any of the specified whitelist terms. The search is case-insensitive. Only the articles that match one or more of the whitelist terms will be retained in the output data frame.
A filtered data frame containing only articles where at least one of the whitelist terms is found in the title or abstract.
## Not run: papers <- get_article(journal = "Nature Medicine") filtered_papers <- filter_articles(papers, whitelist_terms = c("CRISPR", "gene therapy")) ## End(Not run)## Not run: papers <- get_article(journal = "Nature Medicine") filtered_papers <- filter_articles(papers, whitelist_terms = c("CRISPR", "gene therapy")) ## End(Not run)
This function scrapes articles from the current issue of a specified Nature journal, extracting article titles, URLs, and abstracts with robust fallback handling.
get_articles(journal, article_selector = ".c-card.c-card--flush", title_selector = "h3 a", url_selector = "h3 a", abstract_selector = ".c-card__summary", verbose = FALSE)get_articles(journal, article_selector = ".c-card.c-card--flush", title_selector = "h3 a", url_selector = "h3 a", abstract_selector = ".c-card__summary", verbose = FALSE)
journal |
Character string. The full name of the Nature journal (e.g., "Nature Biotechnology", "Nature Medicine"). |
article_selector |
Character string. CSS selector for locating articles on the journal's webpage. Default is ".c-card.c-card–flush". |
title_selector |
Character string. CSS selector for extracting article titles. Default is "h3 a". |
url_selector |
Character string. CSS selector for extracting article URLs. Default is "h3 a". |
abstract_selector |
Character string. CSS selector for extracting article abstracts. Default is ".c-card__summary". |
verbose |
Logical. If TRUE, prints messages about progress and internal steps. Default is FALSE. |
The journal argument is matched (case-insensitively) against available entries from nat_journals().
If not found, an informative error is thrown. Abstracts that are missing are replaced with "Abstract not available".
If titles, URLs, and abstracts differ in length, they are truncated to the shortest length with a warning.
A tibble with columns: title, url, abstract, and source. If no articles are found, returns an empty tibble.
get_articles("Nature Biotechnology") get_articles("Nature Reviews Genetics", verbose = TRUE)get_articles("Nature Biotechnology") get_articles("Nature Reviews Genetics", verbose = TRUE)
This function returns a data frame of Nature journals supported by the Natchat package,
including their full names and URL slugs (used in links or programmatic access). Optionally,
users can provide a journal name to filter and display only the matching journal and its slug.
nat_journals(journal = NULL)nat_journals(journal = NULL)
journal |
Optional character string. The full name of a Nature journal (case-insensitive)
to filter the list. If |
The slug corresponds to the subdirectory used in Nature URLs (e.g., "https://www.nature.com/nbt/" for Nature Biotechnology).
Journal name matching is case-insensitive and supports exact matches only (no partial or fuzzy matching).
A tibble with two columns:
The full name of the journal
The short URL identifier (slug) used in Nature journal web addresses
nat_journals() nat_journals("Nature Medicine") nat_journals("nature biotechnology")nat_journals() nat_journals("Nature Medicine") nat_journals("nature biotechnology")
Save a data frame of article metadata as both a CSV file and an HTML file with a markdown-styled table. Options are provided to control file format outputs and verbosity.
save_report(input, filename, save_csv, save_html, title, cols, width, verbose, outdir)save_report(input, filename, save_csv, save_html, title, cols, width, verbose, outdir)
input |
A data frame containing article data (e.g., "title", "summary", "url"). |
filename |
A character string specifying the base filename. |
save_csv |
Logical. Save output as a CSV file? Default is TRUE. |
save_html |
Logical. Save output as an HTML file? Default is TRUE. |
title |
A character string specifying the HTML page title. Default is "Article Summary Report". |
cols |
A character vector of column names to include in the output. Default is c("title", "summary"). |
width |
A numeric vector of column widths for the HTML table. Default is c(1, 3). |
verbose |
Logical. Should messages be printed? Default is TRUE. |
outdir |
A character string specifying the directory to save files. Default is current working directory ".". |
A timestamp is appended to the base filename to uniquely identify each output. If both save_csv and save_html are FALSE, no files are saved and a message is issued (if verbose = TRUE).
Files are written to disk in the specified formats. The function returns (invisibly) a list of saved file paths.
## Not run: papers <- get_articles(journal = "Nature Medicine") papers_with_summary <- add_summary(papers) save_report(papers_with_summary, save_csv = TRUE, save_html = TRUE) ## End(Not run)## Not run: papers <- get_articles(journal = "Nature Medicine") papers_with_summary <- add_summary(papers) save_report(papers_with_summary, save_csv = TRUE, save_html = TRUE) ## End(Not run)
Retrieve and summarize abstracts from the current issue of a selected Nature Portfolio journal using a local LLM and save the output as CSV and/or HTML. Optionally filter the articles by a set of whitelist terms.
summarize_journal(journal, filename, outdir, model,save_csv,save_html,verbose, whitelist)summarize_journal(journal, filename, outdir, model,save_csv,save_html,verbose, whitelist)
journal |
A character string indicating the name of the supported Nature journal (e.g., "Nature Biotechnology"). |
filename |
A character string specifying the base filename for saving the report. Default is "natchat_summary". |
outdir |
A character string specifying the directory to save output files. Default is current working directory ".". |
model |
A character string specifying the local Ollama model to use for summarization (e.g., "llama3:instruct"). |
save_csv |
Logical. Save the results as a CSV file? Default is TRUE. |
save_html |
Logical. Save the results as an HTML file? Default is TRUE. |
verbose |
Logical. Should informative messages be printed to the console? Default is TRUE. |
whitelist |
Optional character vector of terms used to filter articles based on title and abstract. Default is NULL (no filtering). |
This function is a convenience wrapper around get_articles(), add_prompt(), add_summary(), and save_report().
It scrapes the current issue, optionally filters articles using a whitelist of terms, summarizes abstracts using a local LLM, and exports the result.
Invisibly returns a list of file paths (if saved). Generates summarized article metadata and optionally saves it to disk.
## Not run: summarize_journal( journal = "Nature Medicine", model = "llama3", whitelist = c("CRISPR", "gene therapy"), save_csv = TRUE, save_html = TRUE ) ## End(Not run)## Not run: summarize_journal( journal = "Nature Medicine", model = "llama3", whitelist = c("CRISPR", "gene therapy"), save_csv = TRUE, save_html = TRUE ) ## End(Not run)
This function formats a data frame of articles into a markdown-styled table using the tinytable package.
It allows you to select specific columns from the article data and adjust the column widths for a clean, formatted output.
tt_article(article, cols = c("title", "summary"), width = c(1, 3))tt_article(article, cols = c("title", "summary"), width = c(1, 3))
article |
A data frame containing article information (e.g., "title", "summary", "url"). |
cols |
A character vector of column names to include in the table. Default is |
width |
A numeric vector specifying column widths. Default is |
The function first ensures that the specified columns exist in the data frame and that the length of cols matches the length of width.
It then formats the "title" column as a markdown link, using the article's URL (if provided), and selects the requested columns to be displayed in the table.
The resulting table is formatted as markdown for easy integration into markdown environments.
A formatted markdown table as a character string, ready to be displayed in markdown-supported environments.
## Not run: papers <- get_articles(journal = "Nature Biotechnology") papers_with_summary <- add_summary(papers) tt_article(papers_with_summary) ## End(Not run)## Not run: papers <- get_articles(journal = "Nature Biotechnology") papers_with_summary <- add_summary(papers) tt_article(papers_with_summary) ## End(Not run)