NEWS

EnTraineR 1.0.0 (2026-01-17)

First major release.
This version consolidates the API, improves robustness for FactoMineR outputs, and harmonizes audience-tailored prompts across all functions.

Major Changes & New Features

Unified Architecture (“Hub & Spoke”)

All trainer functions now rely on a centralized core (trainer_core.R) for consistent prompt building, audience profiling, output capture, and regex/quoting helpers. Any update to the core instantly benefits all trainers.

New Trainers

trainer_t_test() — Interpret Student’s t-tests (one-sample, two-sample, paired, Welch) with audience-aware guidance on p-values vs confidence intervals.
trainer_cor_test() — Interpret correlation tests (Pearson, Spearman, Kendall) with clear distinction between statistical significance and practical magnitude.
trainer_var_test() — Interpret F-tests for equality of variances.

Robust ANOVA & Linear Model Support

trainer_AovSum() and trainer_LinearModel() use centralized heuristics (trainer_core_extract_tables_heuristic) to reliably extract F-test and T-test tables even when standard headers are missing (common with capture.output).
Strict, space-safe T-table filtering: factor names with spaces or special characters are handled via core quoting (trainer_core_quote) and detection helpers.

Gemini Integration

gemini_generate() enables direct interaction with Google’s Gemini API.
Includes automatic retry with exponential backoff and clear error surfacing (finish reasons, safety blocks).
Adds compile_to = "html" and compile_to = "docx" to directly render LLM responses into reports.

Improvements

Pedagogy & Interpretability

PCA/MCA: Prompts explicitly guide interpretation of correlation signs (PCA) and category positions (MCA) to prevent hallucinations.
ANOVA: “How-to-read” blocks clarify deviation coding (sum-to-zero); the Intercept is the Grand Mean.
t-test: Clear separation between Inference (p-value) and Magnitude (effect size/mean difference) in prompts.

Developer Experience

Internal helpers for regex, capture, quoting centralized for DRY code.
Improved error messages when input objects are NULL or of the wrong class.
Audience sections standardized (BEGINNER/APPLIED/ADVANCED) with consistent limits (e.g., Takeaway ≤ 5 bullets).

Bug Fixes

Fixed duplication of extract_section/header-loss issues by using the shared core heuristic and single re-attachment of T-table headers after filtering.
Fixed potential regex failures with factor names containing spaces in trainer_AovSum.
Removed unnecessary stringr dependency in examples.

EnTraineR 0.9.0

Major pre-CRAN update with bug fixes, improved prompts, and Gemini API support.

Highlights

New: gemini_generate() — minimal, robust wrapper for Google Gemini (Generative Language API) to programmatically generate LLM responses from R.
- Normalizes model ids (accepts "gemini-2.5-flash" or "models/gemini-2.5-flash").
- Correct endpoint (v1beta/models/{model}:generateContent) and query key.
- Config params: temperature, top_p, top_k, max_output_tokens, stop_sequences, system_instruction, seed, timeout, verbose.
- Adds polite User-Agent header by default: EnTraineR/0.9.0 (https://github.com/Sebastien-Le/EnTraineR).
- Safer parsing: handles empty/blocked candidates and reports finishReason or safety blocks with clear errors.

Bug fixes & robustness

Fixed occasional header-loss in filtered ANOVA T-test tables; always re-attaches the column header exactly once when filtering.
Resolved errors in trainer_LinearModel() when global-fit lines were partially printed (missing df or p). Now defensive against missing fields.
Stabilized audience sections across trainers (BEGINNER/APPLIED/ADVANCED) with consistent wording and limits (e.g., “Takeaway ≤ 5 bullets”).
Cleaned up duplicate attachment issues in verbatim blocks.
Improved error messages when external APIs return 404 or safety-filtered results.

Documentation & examples

Expanded examples for:
- trainer_LinearModel() (ham and deforestation case studies).
- trainer_AovSum() (sensory chocolates and poussin datasets).
Added dataset docs for deforestation, ham, poussin with guarded examples for optional dependencies.
DESCRIPTION improved (clear package blurb) and added Imports: httr2 to satisfy namespace checks for gemini_generate().

Internal / developer notes

Consistent prompt skeleton (context → setup/how-to-read → verbatim → output requirements).
Audience profiles aligned between ANOVA and LinearModel (hierarchy rules, deviation coding, partial vs global tests).
Safer environment handling for keys: looks up GEMINI_API_KEY by default.

EnTraineR 0.1.0

First CRAN release.

New features

Audience-aware prompt builders for common analyses:
- trainer_AovSum() for ANOVA tables (FactoMineR::AovSum).
- trainer_LinearModel() for multiple linear regression (FactoMineR::LinearModel) with global fit summary + per-term F/T sections and optional AIC/BIC selection.
- trainer_t_test() for stats::t.test.
- trainer_var_test() for stats::var.test.
- trainer_prop_test() for stats::prop.test.
- trainer_cor_test() for stats::cor.test (Pearson/Spearman/Kendall).
- trainer_chisq_test() for stats::chisq.test (GOF and contingency tables).
All trainers support three audiences: "beginner", "applied", "advanced".
- Consistent language, alpha handling, and no invented numbers.
- Optional summary_only = TRUE for 3-bullet executive summaries.

Data

Add three documented datasets for teaching and examples:
- deforestation: water/air temperatures before vs after riparian clearing.
- ham: sensory descriptors and overall liking (multiple regression).
- poussin: chick weight by brooding temperature and sex (ANOVA).

Quality and robustness

Clear separation of orientation/setup, verbatim output, and output requirements.
ANOVA T-test section:
- Correctly re-attaches column header after filtering; clear “(filtered)” title and scope message.
LinearModel:
- Includes verbatim model-fit summary (RSE, R², F, p, AIC/BIC).
- Explains partial (added last) F/T tests and interaction hierarchy.
- Selection notes (AIC/BIC): kept/dropped RHS terms and deltas (AIC/BIC/RSE/df).
Proportion and correlation tests:
- Audience-specific guidance, CI wording, and small-sample cautions.
Chi-squared test:
- Branches for GOF vs contingency; notes on Yates correction and Monte Carlo p-values.

Documentation

Roxygen2 docs for all trainers and datasets.
Examples guard optional dependencies with requireNamespace("FactoMineR", quietly = TRUE).

Internals

Shared prompt skeleton via trainer core helpers (audience profile, headers, formatting).
Consistent phrasing across trainers; improved error checks on inputs.