| Title: | Automated Statistical Analysis and Table Generation for Biomedical Research |
|---|---|
| Description: | Generates publication-ready summary tables for clinical research, supporting descriptive summaries and comparisons across two or three groups. The package streamlines the analytical workflow by detecting variable types and applying appropriate statistical tests (Welch t-test, Wilcoxon rank-sum, Welch ANOVA, Kruskal-Wallis, Chi-squared, or Fisher's exact test). Results are formatted as 'tibble' objects and can be exported to 'Word' or 'Excel' using the 'officer', 'flextable', and 'writexl' packages. Optional pairwise post-hoc testing for three-group comparisons (Games-Howell and Dunn's test) is available via the 'rstatix' package. Example data are derived from the landmark adjuvant colon cancer trial described in Moertel et al. (1990) <doi:10.1056/NEJM199002083220602>. |
| Authors: | Joshua D. Preston [aut, cre] (ORCID: <https://orcid.org/0000-0001-9834-3017>), Helen Abadiotakis [aut] (ORCID: <https://orcid.org/0009-0002-8268-927X>), Ailin Tang [aut] (ORCID: <https://orcid.org/0009-0007-8715-1678>), Clayton J. Rust [aut] (ORCID: <https://orcid.org/0000-0001-5929-0733>), Michael E. Halkos [aut] (ORCID: <https://orcid.org/0000-0001-9191-7743>), Mani A. Daneshmand [aut] (ORCID: <https://orcid.org/0000-0002-0191-9911>), Joshua L. Chan [aut] (ORCID: <https://orcid.org/0000-0001-7220-561X>) |
| Maintainer: | Joshua D. Preston <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.7.2 |
| Built: | 2026-06-04 20:02:22 UTC |
| Source: | https://github.com/jdpreston30/TernTables |
Applies the same normality assessment logic used internally by ternG()
and ternD() and returns a tidy tibble showing per-variable (and
per-group) statistics, the gate that triggered the routing decision, and the
final parametric / non-parametric routing outcome.
classify_normality( data, vars = NULL, exclude_vars = NULL, group_var = NULL, consider_normality = "ROBUST" )classify_normality( data, vars = NULL, exclude_vars = NULL, group_var = NULL, consider_normality = "ROBUST" )
data |
A data frame or tibble. |
vars |
Optional character vector of variable names to assess. If
|
exclude_vars |
Optional character vector of variable names to exclude. |
group_var |
Optional name of the grouping variable (as used in
|
consider_normality |
Normality assessment mode — must match what was
(or will be) passed to |
Useful for:
Answering reviewer questions about normality testing ("was Age normally distributed?").
Verifying that a given variable's routing matches your expectation
before running ternG() or ternD().
Generating a supplemental normality audit table for a manuscript.
A tibble with one row per variable group (or one row per
variable when group_var = NULL), containing:
Variable name.
Group level, or "[all]" when no group_var is
supplied.
Non-missing sample size in this group.
Sample skewness (population moments).
Excess kurtosis (population moments; 0 for a normal distribution).
Shapiro-Wilk p-value for this group. NA when the
routing decision was made at Gates 1–3 under "ROBUST", when
n is outside the valid range (3–5000), or when
consider_normality = FALSE.
Integer 1–4 indicating which gate made the routing
decision under consider_normality = "ROBUST", or NA
for TRUE / FALSE modes.
Plain-language explanation of the gate decision, naming which group(s) triggered the rule where relevant.
Logical; TRUE = routed to parametric
(mean SD, t-test / ANOVA); FALSE = non-parametric
(median [IQR], Wilcoxon / Kruskal-Wallis).
Human-readable routing summary:
"Parametric (mean \u00b1 SD)" or
"Non-parametric (median [IQR])".
data(tern_colon) # Single-group audit (ternD-style) classify_normality(tern_colon, exclude_vars = "ID") # Grouped audit matching a ternG call classify_normality(tern_colon, exclude_vars = "ID", group_var = "Recurrence") # Specific variables only classify_normality(tern_colon, vars = c("Age", "Positive_Lymph_Nodes_n"), group_var = "Recurrence") # Using Shapiro-Wilk only (matches consider_normality = TRUE in ternG/ternD) classify_normality(tern_colon, exclude_vars = "ID", group_var = "Recurrence", consider_normality = TRUE)data(tern_colon) # Single-group audit (ternD-style) classify_normality(tern_colon, exclude_vars = "ID") # Grouped audit matching a ternG call classify_normality(tern_colon, exclude_vars = "ID", group_var = "Recurrence") # Specific variables only classify_normality(tern_colon, vars = c("Age", "Positive_Lymph_Nodes_n"), group_var = "Recurrence") # Using Shapiro-Wilk only (matches consider_normality = TRUE in ternG/ternD) classify_normality(tern_colon, exclude_vars = "ID", group_var = "Recurrence", consider_normality = TRUE)
Re-displays the preprocessing summary for a ternP_result object.
Note that ternP already emits this summary automatically at
the time it is called, so this method is most useful for reviewing the
summary after the fact (e.g. typing result at the console later
in a session).
## S3 method for class 'ternP_result' print(x, ...)## S3 method for class 'ternP_result' print(x, ...)
x |
A |
... |
Currently unused; included for S3-method compatibility. |
Invisibly returns x.
A processed subset of the colon dataset restricted to the
recurrence endpoint (etype == 1), providing one row per patient.
Variables have been relabelled with clinically descriptive names and
factor levels suitable for direct use in TernTables functions. This dataset
is provided as a ready-to-use example for demonstrating ternD() and
ternG() functionality.
tern_colontern_colon
A tibble with 929 rows and 12 variables:
Integer patient identifier.
Age at study entry (years).
Patient sex: "Female" or "Male".
Colonic obstruction present: "N" or "Y".
Bowel perforation present: "N" or "Y".
Number of positive lymph nodes detected.
More than 4 positive lymph nodes: "N" or "Y".
Tumour adherence to surrounding organs: "N" or "Y".
Tumour differentiation grade: "Well",
"Moderate", or "Poor".
Depth of tumour penetration: "Submucosa",
"Muscle", "Serosa", or "Contiguous Structures".
Recurrence status: "No Recurrence" or "Recurrence".
Randomised treatment: "Levamisole + 5FU",
"Levamisole", or "Observation".
Derived from colon (Laurie et al., 1989).
See colon for full provenance.
Pre-processing script: data-raw/tern_colon.R.
data(tern_colon) head(tern_colon)data(tern_colon) head(tern_colon)
Takes a list of tibbles previously created by ternD() or ternG()
and writes them all into one .docx file, one table per page, preserving
the exact formatting settings that were used when each table was built.
ternB( tables, output_docx, page_break = TRUE, methods_doc = FALSE, methods_filename = "TernTables_methods.docx", open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )ternB( tables, output_docx, page_break = TRUE, methods_doc = FALSE, methods_filename = "TernTables_methods.docx", open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )
tables |
A list of tibbles created by |
output_docx |
Output file path ending in |
page_break |
Logical; if |
methods_doc |
Logical; if |
methods_filename |
Output file path for the methods document. Defaults
to |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family for all Word output. Any font name accepted by
the rendering system is valid. Can also be set via
|
ternB() works by replaying the exact word_export() call that
ternD() / ternG() would have made – using stored metadata
attached as an attribute to each returned tibble – but directing all output
into a single combined document instead of separate files.
Table captions (table_caption) and footnotes (table_footnote) specified in the original
ternD() / ternG() call are reproduced automatically. You can
override them by modifying the "ternB_meta" attribute before calling
ternB(), though in practice it is easier to set captions and footnotes when you
first build each table.
Invisibly returns the path to the written Word file.
data(tern_colon) T1 <- ternD(tern_colon, exclude_vars = "ID", table_caption = "Table 1. Overall patient characteristics.", methods_doc = FALSE, open_doc = FALSE) T2 <- ternG(tern_colon, group_var = "Recurrence", exclude_vars = "ID", table_caption = "Table 2. Characteristics by recurrence status.", methods_doc = FALSE, open_doc = FALSE) ternB(list(T1, T2), output_docx = file.path(tempdir(), "combined_tables.docx"), open_doc = FALSE)data(tern_colon) T1 <- ternD(tern_colon, exclude_vars = "ID", table_caption = "Table 1. Overall patient characteristics.", methods_doc = FALSE, open_doc = FALSE) T2 <- ternG(tern_colon, group_var = "Recurrence", exclude_vars = "ID", table_caption = "Table 2. Characteristics by recurrence status.", methods_doc = FALSE, open_doc = FALSE) ternB(list(T1, T2), output_docx = file.path(tempdir(), "combined_tables.docx"), open_doc = FALSE)
Creates a descriptive summary table with a single "Total" column format.
By default (consider_normality = "ROBUST"), continuous variables are shown
as mean +/- SD or median [IQR] based on a four-gate decision (n < 3 fail-safe, skewness/kurtosis, CLT, and Shapiro-Wilk).
This can be overridden via consider_normality and force_ordinal.
ternD( data, vars = NULL, exclude_vars = NULL, force_ordinal = NULL, force_normal = NULL, force_continuous = NULL, output_xlsx = NULL, output_docx = NULL, consider_normality = "ROBUST", print_normality = FALSE, round_intg = FALSE, round_decimal = NULL, smart_rename = TRUE, insert_subheads = TRUE, factor_order = "mixed", methods_doc = TRUE, methods_filename = "TernTables_methods.docx", category_start = NULL, plain_header = NULL, table_font_size = 9, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", line_break_header = getOption("TernTables.line_break_header", TRUE), open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial"), show_missing = FALSE, zero_to_dash = FALSE, show_missingness = FALSE, missing_indicators = NULL )ternD( data, vars = NULL, exclude_vars = NULL, force_ordinal = NULL, force_normal = NULL, force_continuous = NULL, output_xlsx = NULL, output_docx = NULL, consider_normality = "ROBUST", print_normality = FALSE, round_intg = FALSE, round_decimal = NULL, smart_rename = TRUE, insert_subheads = TRUE, factor_order = "mixed", methods_doc = TRUE, methods_filename = "TernTables_methods.docx", category_start = NULL, plain_header = NULL, table_font_size = 9, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", line_break_header = getOption("TernTables.line_break_header", TRUE), open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial"), show_missing = FALSE, zero_to_dash = FALSE, show_missingness = FALSE, missing_indicators = NULL )
data |
Tibble with variables. |
vars |
Character vector of variables to summarize. Defaults to all except |
exclude_vars |
Character vector to exclude from the summary. |
force_ordinal |
Character vector of variables to treat as ordinal (i.e., use median [IQR])
regardless of the |
force_normal |
Character vector of variable names to treat as normally distributed, bypassing all
normality assessment. Listed variables are summarized as mean |
force_continuous |
Character vector of variables to force treatment as continuous (mean |
output_xlsx |
Optional Excel filename to export the table. |
output_docx |
Optional Word filename to export the table. |
consider_normality |
Character or logical; controls routing of continuous variables to
mean |
print_normality |
Logical; if |
round_intg |
Logical; if |
round_decimal |
Integer or |
smart_rename |
Logical; if |
insert_subheads |
Logical; if |
factor_order |
Character; controls the ordering of factor levels in the output.
|
methods_doc |
Logical; if |
methods_filename |
Character; filename for the methods document.
Default is |
category_start |
Named character vector specifying where to insert category headers.
Names are the header label text to display; values are the anchor variable – either the
original column name (e.g. |
plain_header |
Named character vector, same interface as |
table_font_size |
Numeric; font size for Word document output tables. Default is 9. |
manual_italic_indent |
Character vector of display variable names (post-cleaning) that should be
formatted as italicized and indented in Word output – matching the appearance of factor sub-category
rows. Has no effect on the returned tibble; only applies when |
manual_underline |
Character vector of display variable names (post-cleaning) that should be
formatted as underlined in Word output – matching the appearance of multi-category variable headers.
Has no effect on the returned tibble; only applies when |
table_caption |
Optional character string for a table caption to display above the table in
the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string for a footnote to display below the table in the
Word document. Rendered as size 6 Arial italic with a double-bar border above and below.
Default is |
abbreviation_footnote |
Optional character string listing abbreviations. Always printed
first in the footnote block. Default |
variable_footnote |
Optional named character vector. Names are display variable names
(case-insensitive); values are the footnote definition text. Each variable gets the next
symbol appended to its name in the table, and the footnote block lists each definition
below the abbreviation line. To share one footnote between multiple variables, separate
their names with a pipe: |
index_style |
Character; |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output (table,
captions, footnotes, methods document). Any font installed on the system that
renders the document may be used. Popular options include |
show_missing |
Logical; if |
zero_to_dash |
Logical; if |
show_missingness |
Controls whether a |
missing_indicators |
Optional character vector of string values to treat as missing
in addition to (or instead of) the built-in ternP defaults. When |
The function always returns a tibble with a single Total (N = n) column format, regardless of the
consider_normality setting. The behavior for numeric variables follows this priority:
Variables in force_ordinal: Always use median [IQR]
When consider_normality = "ROBUST": Four-gate decision (n<3 fail-safe, skewness/kurtosis, CLT, Shapiro-Wilk)
When consider_normality = TRUE: Use Shapiro-Wilk test to choose format
When consider_normality = FALSE: Default to mean +/- SD
For categorical variables, the function shows frequencies and percentages. When
insert_subheads = TRUE, categorical variables with 3 or more levels are displayed with
hierarchical formatting (main variable as header, levels as indented sub-rows). Binary variables
(Y/N, YES/NO, or numeric 1/0 auto-detected as Y/N) always use a single-row format showing
only the positive/yes count, regardless of this setting. Two-level categorical variables whose
values are not Y/N, YES/NO, or 1/0 (e.g. Male/Female) also use the hierarchical sub-row format.
A tibble with one row per variable (multi-row for factors), containing:
Variable names with appropriate indentation
Summary statistics (mean +/- SD, median [IQR], or n (%) as appropriate)
Shapiro-Wilk P values (only if print_normality = TRUE)
data(tern_colon) # Basic descriptive summary ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE) # With normality-aware formatting and category section headers ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, category_start = c("Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)")) # Force specific variables to ordinal (median [IQR]) display ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, force_ordinal = c("Positive_Lymph_Nodes_n")) # Export to Word (writes a file to tempdir) ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, open_doc = FALSE, output_docx = file.path(tempdir(), "descriptive.docx"), category_start = c("Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence"))data(tern_colon) # Basic descriptive summary ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE) # With normality-aware formatting and category section headers ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, category_start = c("Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)")) # Force specific variables to ordinal (median [IQR]) display ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, force_ordinal = c("Positive_Lymph_Nodes_n")) # Export to Word (writes a file to tempdir) ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, open_doc = FALSE, output_docx = file.path(tempdir(), "descriptive.docx"), category_start = c("Patient Demographics" = "Age (yr)", "Surgical Findings" = "Colonic Obstruction", "Tumor Characteristics" = "Positive Lymph Nodes (n)", "Outcomes" = "Recurrence"))
Creates a grouped summary table with optional statistical testing for group
comparisons. Supports numeric and categorical variables; numeric variables
can be treated as ordinal via force_ordinal. Includes options to
calculate P values and odds ratios. For descriptive
(ungrouped) tables, use ternD.
ternG( data, vars = NULL, exclude_vars = NULL, group_var, force_ordinal = NULL, force_normal = NULL, force_continuous = NULL, group_order = NULL, output_xlsx = NULL, output_docx = NULL, OR_col = FALSE, OR_method = "dynamic", consider_normality = "ROBUST", print_normality = FALSE, show_test = FALSE, p_digits = 3, round_intg = FALSE, round_decimal = NULL, smart_rename = TRUE, insert_subheads = TRUE, factor_order = "mixed", table_font_size = 9, methods_doc = TRUE, methods_filename = "TernTables_methods.docx", category_start = NULL, plain_header = NULL, manual_italic_indent = NULL, manual_underline = NULL, indent_info_column = FALSE, show_total = TRUE, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", line_break_header = getOption("TernTables.line_break_header", TRUE), post_hoc = FALSE, p_adjust = FALSE, p_adjust_display = "fdr_only", open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial"), show_missing = FALSE, show_p = TRUE, zero_to_dash = FALSE, percentage_compute = "column", categorical_posthoc = FALSE, show_missingness = FALSE, missing_indicators = NULL )ternG( data, vars = NULL, exclude_vars = NULL, group_var, force_ordinal = NULL, force_normal = NULL, force_continuous = NULL, group_order = NULL, output_xlsx = NULL, output_docx = NULL, OR_col = FALSE, OR_method = "dynamic", consider_normality = "ROBUST", print_normality = FALSE, show_test = FALSE, p_digits = 3, round_intg = FALSE, round_decimal = NULL, smart_rename = TRUE, insert_subheads = TRUE, factor_order = "mixed", table_font_size = 9, methods_doc = TRUE, methods_filename = "TernTables_methods.docx", category_start = NULL, plain_header = NULL, manual_italic_indent = NULL, manual_underline = NULL, indent_info_column = FALSE, show_total = TRUE, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", line_break_header = getOption("TernTables.line_break_header", TRUE), post_hoc = FALSE, p_adjust = FALSE, p_adjust_display = "fdr_only", open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial"), show_missing = FALSE, show_p = TRUE, zero_to_dash = FALSE, percentage_compute = "column", categorical_posthoc = FALSE, show_missingness = FALSE, missing_indicators = NULL )
data |
Tibble containing all variables. |
vars |
Character vector of variables to summarize. Defaults to all except |
exclude_vars |
Character vector of variable(s) to exclude. |
group_var |
Character, the grouping variable (factor or character with >=2 levels). |
force_ordinal |
Character vector of variables to treat as ordinal (i.e., use medians/IQR and nonparametric tests). |
force_normal |
Character vector of variable names to treat as normally distributed, bypassing all
normality assessment (Gates 1–4 under |
force_continuous |
Character vector of variables to force treatment as continuous (mean |
group_order |
Optional character vector to specify a custom group level order. |
output_xlsx |
Optional filename to export the table as an Excel file. |
output_docx |
Optional filename to export the table as a Word document. |
OR_col |
Logical; if |
OR_method |
Character; controls how odds ratios are calculated when |
consider_normality |
Character or logical; controls how continuous variables are routed to
parametric vs. non-parametric tests.
|
print_normality |
Logical; if |
show_test |
Logical; if |
p_digits |
Integer; number of decimal places for P values (default 3). |
round_intg |
Logical; if |
round_decimal |
Integer or |
smart_rename |
Logical; if |
insert_subheads |
Logical; if |
factor_order |
Character; controls the ordering of factor levels in the output.
|
table_font_size |
Numeric; font size for Word document output tables. Default is 9. |
methods_doc |
Logical; if |
methods_filename |
Character; filename for the methods document. Default is |
category_start |
Named character vector specifying where to insert category headers.
Names are the header label text to display; values are the anchor variable – either the
original column name (e.g. |
plain_header |
Named character vector, same interface as |
manual_italic_indent |
Character vector of display variable names (post-cleaning) that should be
formatted as italicized and indented in Word output – matching the appearance of factor sub-category
rows. Has no effect on the returned tibble; only applies when |
manual_underline |
Character vector of display variable names (post-cleaning) that should be
formatted as underlined in Word output – matching the appearance of multi-category variable headers.
Has no effect on the returned tibble; only applies when |
indent_info_column |
Logical; if |
show_total |
Logical; if |
table_caption |
Optional character string for a table caption to display above the table in
the Word document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string for a footnote to display below the table in the
Word document. Rendered as size 6 Arial italic with a double-bar border above and below.
Default is |
abbreviation_footnote |
Optional character string listing abbreviations. Always printed
first in the footnote block. Default |
variable_footnote |
Optional named character vector. Names are display variable names
(case-insensitive); values are the footnote definition text. Each variable gets the next
symbol appended to its name in the table, and the footnote block lists each definition
below the abbreviation line. To share one footnote between multiple variables, separate
their names with a pipe: |
index_style |
Character; |
line_break_header |
Logical; if |
post_hoc |
Logical; if |
p_adjust |
Logical; if |
p_adjust_display |
Character; controls how BH-corrected P values appear in the output
when |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output (table,
captions, footnotes, methods document). Any font installed on the system that
renders the document may be used. Popular options include |
show_missing |
Logical; if |
show_p |
Logical; if |
zero_to_dash |
Logical; if |
percentage_compute |
Character; controls the denominator used when computing percentages
for categorical variables. |
categorical_posthoc |
Logical; if |
show_missingness |
Controls whether a column of missing-value percentages is appended
to the table. Options: |
missing_indicators |
Optional character vector of string values to treat as missing
in addition to (or instead of) the built-in ternP defaults. When |
Independence assumption: all statistical tests applied by this
function (Welch's t-test, Wilcoxon rank-sum, Welch ANOVA,
Kruskal-Wallis, chi-squared, and Fisher's exact) assume that observations
are independent — each row must represent a distinct, unrelated subject.
ternG is not appropriate for repeated-measures, longitudinal, or
clustered data (e.g. pre/post measurements, matched pairs, or patients
nested within sites).
A tibble with one row per variable (multi-row for multi-level factors), showing summary statistics by group, P values, test type, and optionally odds ratios and total summary column.
data(tern_colon) # 2-group comparison ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", methods_doc = FALSE) # 2-group comparison with odds ratios ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, methods_doc = FALSE) # 3-group comparison ternG(tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm", group_order = c("Observation", "Levamisole", "Levamisole + 5FU"), methods_doc = FALSE) # 2-group comparison with BH FDR correction (fdr_only — default display) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", p_adjust = TRUE, methods_doc = FALSE) # 2-group comparison with BH FDR correction (show raw + corrected side by side) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", p_adjust = TRUE, p_adjust_display = "both", methods_doc = FALSE) # Export to Word (writes a file to tempdir) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, methods_doc = FALSE, open_doc = FALSE, output_docx = file.path(tempdir(), "comparison.docx"), category_start = c("Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)"))data(tern_colon) # 2-group comparison ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", methods_doc = FALSE) # 2-group comparison with odds ratios ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, methods_doc = FALSE) # 3-group comparison ternG(tern_colon, exclude_vars = c("ID"), group_var = "Treatment_Arm", group_order = c("Observation", "Levamisole", "Levamisole + 5FU"), methods_doc = FALSE) # 2-group comparison with BH FDR correction (fdr_only — default display) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", p_adjust = TRUE, methods_doc = FALSE) # 2-group comparison with BH FDR correction (show raw + corrected side by side) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", p_adjust = TRUE, p_adjust_display = "both", methods_doc = FALSE) # Export to Word (writes a file to tempdir) ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", OR_col = TRUE, methods_doc = FALSE, open_doc = FALSE, output_docx = file.path(tempdir(), "comparison.docx"), category_start = c("Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)"))
ternP() cleans a raw data frame loaded from a CSV or XLSX file,
applying a standardized set of transformations and performing validation
checks before the data is passed to ternG or
ternD.
ternP(data, mode = "auto", extra_na = NULL, drop_cols = NULL)ternP(data, mode = "auto", extra_na = NULL, drop_cols = NULL)
data |
A data frame or tibble as loaded from a CSV or XLSX file (e.g.
via |
mode |
Preprocessing mode. One of
|
extra_na |
Optional character vector of additional string values to
treat as missing (converted to |
drop_cols |
Optional character vector of column names to drop from
the data before cleaning begins. Intended for use in |
A named list with three elements:
clean_dataA tibble containing the fully cleaned dataset,
ready to pass to ternG() or ternD().
sparse_rowsA tibble of rows from clean_data where
more than 50% of values are NA. These rows are retained
in clean_data but extracted here for optional review or download.
An empty tibble if no sparse rows exist.
feedbackA named list of feedback items. Each element is
NULL if the corresponding transformation was not triggered, or a
value describing what changed:
string_na_convertedA named list with elements
total (integer count of values converted) and cols
(character vector of affected column names), or NULL if no
string NA values were found.
blank_rows_removedA named list with elements
count (integer) and row_indices (integer vector of
original row positions removed), or NULL if none.
sparse_rows_flaggedA named list with elements
count (integer) and row_indices (integer vector of
row positions in clean_data with >50% missingness),
or NULL if none.
case_normalized_varsA named list with elements
cols (character vector of affected column names) and
detail (a named list per column, each with
changed_from and changed_to character vectors
showing the exact value changes), or NULL if none.
#'
dropped_user_colsCharacter vector of column names
explicitly dropped via the drop_cols parameter, or
NULL if drop_cols was not used.
manual_modeLogical. TRUE when mode = "manual"
was used (PHI check skipped), FALSE otherwise.
dropped_empty_colsCharacter vector of column names
(or "" for unnamed columns) that were dropped because they
were 100% empty, or NULL if none.
date_cols_detectedCharacter vector of column names
that appear to contain date values — either R Date/POSIXct
types (from Excel) or character columns where 80% of non-NA values
match a common date pattern (from CSV). These columns are not
dropped automatically; the caller should decide whether to exclude
them or keep them as categorical variables.
Date columns are detected (R Date/POSIXct types, or
character columns where 80% of values match a common date pattern) and
reported in feedback$date_cols_detected. They are not dropped
automatically — the caller decides whether to exclude or keep them.
String NA values ("NA", "na", "N/A", "NaN",
"missing", "unknown", "unk", "not available",
"not applicable", "none", "null", "nil",
"-", ".", "?") are converted to NA
(matching is case-insensitive).
Leading and trailing whitespace is trimmed from all character columns.
Columns that are 100% empty (all NA) are silently dropped.
Rows where every cell is NA are removed.
Character columns where values differ only by capitalization
(e.g. "Male" vs "MAle") are standardized to title case.
ternP() stops with a descriptive error if:
Any column name matches a protected health information (PHI) pattern
(e.g. MRN, DOB, FirstName). De-identified research
identifiers such as patient_id, subject_id, and
participant_id are explicitly excluded, as are clinical-event
dates (admission date, discharge date, visit date, etc.). Only
personal-identity dates such as DOB and DOD are flagged.
Any column with a blank or whitespace-only header contains data. Completely empty unnamed columns are silently dropped and do not trigger this error.
ternG for grouped comparisons, ternD for descriptive statistics.
# Load a messy CSV and preprocess it path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- read.csv(path, stringsAsFactors = FALSE) result <- ternP(raw) # Access cleaned data result$clean_data # Review preprocessing feedback result$feedback # Sparse rows flagged (>50% missing), retained but not removed result$sparse_rows# Load a messy CSV and preprocess it path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- read.csv(path, stringsAsFactors = FALSE) result <- ternP(raw) # Access cleaned data result$clean_data # Review preprocessing feedback result$feedback # Sparse rows flagged (>50% missing), retained but not removed result$sparse_rows
ternStyle() renders any user-built tibble into a Word document with
the exact same visual style as tables produced by ternG(),
ternD(), and word_export() – Arial font, grey header,
double-bar footer, caption/footnote block, and citation footer.
ternStyle( tbl, filename = NULL, col1_name = NULL, subheader_rows = NULL, bold_rows = NULL, bold_sig = NULL, italic_rows = NULL, bold_cols = NULL, italic_cols = NULL, header_format_follow = FALSE, round_intg = FALSE, round_decimal = NULL, font_size = 9, category_start = NULL, plain_header = NULL, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", col1_header = NULL, line_break_header = FALSE, open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )ternStyle( tbl, filename = NULL, col1_name = NULL, subheader_rows = NULL, bold_rows = NULL, bold_sig = NULL, italic_rows = NULL, bold_cols = NULL, italic_cols = NULL, header_format_follow = FALSE, round_intg = FALSE, round_decimal = NULL, font_size = 9, category_start = NULL, plain_header = NULL, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, variable_footnote = NULL, index_style = "symbols", col1_header = NULL, line_break_header = FALSE, open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )
tbl |
A data frame or tibble. The first column is used as the row-label
column (rendered as "Variable" unless renamed via |
filename |
Output file path ending in |
col1_name |
Optional character string. If supplied, the first column is
renamed to this label in the rendered table. The column need not be named
|
subheader_rows |
Character vector of labels that already exist as rows
in |
bold_rows |
Integer vector of body row indices (1-based, final rendered
table) to bold across every column. Applied after all structural formatting
so it always wins. Default |
bold_sig |
Optional named list for cell-level p-value-based bolding.
Use this when your tibble has pre-formatted p-value strings in columns that
are not named
The Variable column is never modified by bold_sig = list(
p_cols = c("Uni p", "Multi p"),
hr_cols = c("Uni HR (95% CI)", "Multi HR (95% CI)"),
threshold = 0.05
)
Default |
italic_rows |
Integer vector of body row indices to italicize across
every column. Default |
bold_cols |
Integer vector of column indices (1-based) to bold across
all body rows. Default |
italic_cols |
Integer vector of column indices to italicize across all
body rows. Default |
header_format_follow |
Logical; if |
round_intg |
Logical; passed to |
round_decimal |
Integer or NULL; if provided, rounds all numeric values in the
table to this many decimal places before rendering. Passed to |
font_size |
Numeric; font size for table body. Default |
category_start |
Named character vector; same as in |
plain_header |
Named character vector; same as in |
manual_italic_indent |
Character vector of row labels to italicize and
indent (sub-item appearance). Default |
manual_underline |
Character vector of row labels to underline (multi-
category header appearance without the full subheader treatment). Default
|
table_caption |
Optional character string for the caption above the
table. Default |
table_footnote |
Optional character string for a footnote below the
table. Default |
abbreviation_footnote |
Optional character string (or character vector)
of abbreviations. Always printed first in the footnote block. Default
|
variable_footnote |
Optional named character vector of per-variable
footnote definitions (case-insensitive name match). To share one footnote
symbol between multiple variables, separate their names with a pipe:
|
index_style |
Character; |
col1_header |
Optional character string. Overrides the top-left header
cell. When |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family name used for all Word output.
Defaults to |
Use this function when you have pre-computed summary statistics in a tibble
(e.g. a custom cross-tab or manually assembled output table) and want it to
match the rest of your TernTables document without running it through the full
ternG/ternD pipeline.
Invisibly returns the input tibble (after renaming and coercion)
with a "ternB_meta" attribute attached. This makes the result
directly passable to ternB for bundling with other tables
into a combined Word document.
library(tibble) my_tbl <- tibble( Variable = c("Section A", "Row 1", "Row 2", "Section B", "Row 3"), `Group 1` = c("", "12 (40%)", "18 (60%)", "", "9 (30%)"), `Group 2` = c("", "15 (50%)", "15 (50%)", "", "21 (70%)") ) ternStyle( tbl = my_tbl, filename = file.path(tempdir(), "custom_table.docx"), subheader_rows = c("Section A", "Section B"), open_doc = FALSE, citation = FALSE )library(tibble) my_tbl <- tibble( Variable = c("Section A", "Row 1", "Row 2", "Section B", "Row 3"), `Group 1` = c("", "12 (40%)", "18 (60%)", "", "9 (30%)"), `Group 2` = c("", "15 (50%)", "15 (50%)", "", "21 (70%)") ) ternStyle( tbl = my_tbl, filename = file.path(tempdir(), "custom_table.docx"), subheader_rows = c("Section A", "Section B"), open_doc = FALSE, citation = FALSE )
Format a mean +/- SD string
val_format(mean, sd)val_format(mean, sd)
mean |
Numeric mean value. Formatted to 1 decimal place. |
sd |
Numeric standard deviation. Formatted to 1 decimal place. |
A character string of the form "X.X \u00b1 Y.Y" where both values are
rendered to 1 decimal place using fixed-point notation.
Format a P value for reporting
val_p_format(p, digits = 3)val_p_format(p, digits = 3)
p |
Numeric P value in the range [0, 1]. |
digits |
Integer; number of decimal places for reported P values. Default is 3.
Note: for p < 0.001, the value is reported in scientific notation with 1 significant figure
regardless of |
A character string. Values < 0.001 are formatted in scientific notation with 1 significant
figure (e.g., "8E-4"). All other values use fixed-point notation rounded to digits
decimal places.
Export TernTables output to a formatted Word document
word_export( tbl, filename, round_intg = FALSE, round_decimal = NULL, font_size = 9, category_start = NULL, plain_header = NULL, subheader_rows = NULL, bold_rows = NULL, bold_sig = NULL, italic_rows = NULL, bold_cols = NULL, italic_cols = NULL, header_format_follow = FALSE, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, posthoc_footnote = NULL, variable_footnote = NULL, index_style = "symbols", page_break_after = FALSE, col1_header = NULL, line_break_header = getOption("TernTables.line_break_header", TRUE), open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )word_export( tbl, filename, round_intg = FALSE, round_decimal = NULL, font_size = 9, category_start = NULL, plain_header = NULL, subheader_rows = NULL, bold_rows = NULL, bold_sig = NULL, italic_rows = NULL, bold_cols = NULL, italic_cols = NULL, header_format_follow = FALSE, manual_italic_indent = NULL, manual_underline = NULL, table_caption = NULL, table_footnote = NULL, abbreviation_footnote = NULL, posthoc_footnote = NULL, variable_footnote = NULL, index_style = "symbols", page_break_after = FALSE, col1_header = NULL, line_break_header = getOption("TernTables.line_break_header", TRUE), open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )
tbl |
A tibble created by ternG or ternD |
filename |
Output file path ending in .docx |
round_intg |
Logical; if TRUE, adds note about integer rounding. Default is FALSE. |
round_decimal |
Integer or NULL; if provided, rounds all numeric values in the table
to this many decimal places before rendering. Default is |
font_size |
Numeric; font size for table body. Default is 9. |
category_start |
Named character vector specifying category headers. Names are header label text; values are anchor variable names – either the original column name or the cleaned display name (both forms accepted). |
plain_header |
Named character vector, same interface as |
subheader_rows |
Character vector of labels that already exist as rows in the table and
should be formatted as full category section headers (merged across all columns, bold, with a
bottom border line). Unlike |
bold_rows |
Integer vector of body row indices (1-based, in the final rendered table) to
bold across every column. Applied as the last formatting pass so it overrides structural
formatting. Default |
bold_sig |
Optional named list for cell-level conditional bolding based on parsed p-values.
Intended for use with
For each p-value cell where the parsed numeric value is below |
italic_rows |
Integer vector of body row indices to italicize across every column.
Default |
bold_cols |
Integer vector of column indices (1-based) to bold across all body rows.
Default |
italic_cols |
Integer vector of column indices to italicize across all body rows.
Default |
header_format_follow |
Logical; if |
manual_italic_indent |
Character vector of display variable names (post-cleaning) to force into italicized and indented formatting, matching the appearance of factor sub-category rows (e.g., levels of a multi-category variable). Use this for rows that should visually appear as sub-items but are not automatically detected as such. |
manual_underline |
Character vector of display variable names (post-cleaning) to force into underlined formatting, matching the appearance of multi-category variable header rows. Use this for rows that should visually appear as section headers but are not automatically detected as such. |
table_caption |
Optional character string to display as a caption above the table in the Word
document. Rendered as size 11 Arial bold, single-spaced with a small gap before the table.
Default is |
table_footnote |
Optional character string to display as a footnote below the table in the Word
document. Rendered as size 6 Arial italic. A double-bar border is applied above and below the
footnote row. Default is |
abbreviation_footnote |
Optional character string (or character vector, which will be
collapsed with spaces) listing abbreviations to display at the top of the footnote block.
Always printed first, before any variable-specific footnote lines. Default |
posthoc_footnote |
Optional character string describing post-hoc CLD superscript
conventions. When supplied by |
variable_footnote |
Optional named character vector. Names are display variable names as
they appear in the table (case-insensitive match); values are the footnote definition text
for that variable. Each entry is assigned the next symbol in the sequence (*, dagger,
double-dagger, ...) and the symbol is appended to the variable name in column 1.
The footnote block lists each as |
index_style |
Character; controls the footnote symbol sequence. |
page_break_after |
Logical; if |
col1_header |
Optional character string. Overrides the top-left header cell text.
When |
line_break_header |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family used for the entire Word table and its caption,
footnote, and citation. Any font name accepted by the rendering system is valid (Word
will fall back to its default if the font is not installed). Can also be set package-wide
via |
Invisibly returns the path to the written Word file.
data(tern_colon) tbl <- ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, open_doc = FALSE) word_export( tbl = tbl, filename = file.path(tempdir(), "descriptive.docx"), open_doc = FALSE, category_start = c( "Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)" ) )data(tern_colon) tbl <- ternD(tern_colon, exclude_vars = c("ID"), methods_doc = FALSE, open_doc = FALSE) word_export( tbl = tbl, filename = file.path(tempdir(), "descriptive.docx"), open_doc = FALSE, category_start = c( "Patient Demographics" = "Age (yr)", "Tumor Characteristics" = "Positive Lymph Nodes (n)" ) )
Generates a Word document summarising the preprocessing transformations
applied by ternP. Only sections for triggered transformations
are written; if the data required no preprocessing, a single sentence
stating that is produced instead. The document can be attached to a
data-management log or supplemental materials.
write_cleaning_doc( result, filename = "cleaning_summary.docx", font_family = getOption("TernTables.font_family", "Arial"), open_doc = TRUE, citation = TRUE )write_cleaning_doc( result, filename = "cleaning_summary.docx", font_family = getOption("TernTables.font_family", "Arial"), open_doc = TRUE, citation = TRUE )
result |
A |
filename |
Output file path ending in |
font_family |
Character; font family for the Word document. Default |
open_doc |
Logical; if |
citation |
Logical; if |
Invisibly returns the path to the written Word file.
path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- read.csv(path, stringsAsFactors = FALSE) result <- ternP(raw) write_cleaning_doc(result, filename = file.path(tempdir(), "cleaning_summary.docx"), open_doc = FALSE)path <- system.file("extdata/csv", "tern_colon_messy.csv", package = "TernTables") raw <- read.csv(path, stringsAsFactors = FALSE) result <- ternP(raw) write_cleaning_doc(result, filename = file.path(tempdir(), "cleaning_summary.docx"), open_doc = FALSE)
Generates a Word document containing a methods paragraph describing the
statistical approach used in a specific ternG or ternD run.
The paragraph is fully dynamic: it reflects the tests that were actually used,
the number of comparison groups, whether odds ratios were calculated, and
whether post-hoc testing was performed. It is headed by a bold
Statistical Methods label and followed by a brief attribution footer.
write_methods_doc( tbl, filename, n_levels = 2, OR_col = FALSE, OR_method = "dynamic", source = "ternG", post_hoc = FALSE, categorical_posthoc = FALSE, cat_posthoc_fisher_vars = character(0), show_missingness = FALSE, missing_indicators = NULL, boilerplate = FALSE, p_adjust = FALSE, open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )write_methods_doc( tbl, filename, n_levels = 2, OR_col = FALSE, OR_method = "dynamic", source = "ternG", post_hoc = FALSE, categorical_posthoc = FALSE, cat_posthoc_fisher_vars = character(0), show_missingness = FALSE, missing_indicators = NULL, boilerplate = FALSE, p_adjust = FALSE, open_doc = TRUE, citation = TRUE, font_family = getOption("TernTables.font_family", "Arial") )
tbl |
A tibble created by |
filename |
Output file path ending in |
n_levels |
Number of group levels used in |
OR_col |
Logical; whether odds ratios were calculated. Default |
OR_method |
Character; the OR calculation method used in |
source |
Character; |
post_hoc |
Logical; whether pairwise post-hoc testing was requested
( |
categorical_posthoc |
Logical; whether adjusted standardized residuals
were requested ( |
cat_posthoc_fisher_vars |
Character vector of variable names for which
Fisher's exact test was the omnibus test while |
show_missingness |
Logical or character; whether missingness columns were added
to the table ( |
missing_indicators |
Character vector of string values treated as missing in
addition to R |
boilerplate |
Logical; if |
p_adjust |
Logical; if |
open_doc |
Logical; if |
citation |
Logical; if |
font_family |
Character; font family for the Word document. Default |
When boilerplate = TRUE, all run-specific arguments are ignored and a
comprehensive reference document is written instead, covering all five standard
TernTables configurations with package-default phrasing. See the
boilerplate parameter for details.
Invisibly returns the methods paragraph text as a character string
(or, when boilerplate = TRUE, invisibly returns the output file path).
Useful for programmatic inspection or testing without opening the Word file.
data(tern_colon) tbl <- ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", methods_doc = FALSE, open_doc = FALSE) write_methods_doc(tbl, filename = file.path(tempdir(), "methods.docx"), open_doc = FALSE) # Write a comprehensive reference document covering all configurations. write_methods_doc(tbl = NULL, filename = file.path(tempdir(), "boilerplate_methods.docx"), boilerplate = TRUE, open_doc = FALSE)data(tern_colon) tbl <- ternG(tern_colon, exclude_vars = c("ID"), group_var = "Recurrence", methods_doc = FALSE, open_doc = FALSE) write_methods_doc(tbl, filename = file.path(tempdir(), "methods.docx"), open_doc = FALSE) # Write a comprehensive reference document covering all configurations. write_methods_doc(tbl = NULL, filename = file.path(tempdir(), "boilerplate_methods.docx"), boilerplate = TRUE, open_doc = FALSE)