Takes a data frame with one row per article (such as the output of [rt_all_pmc()] joined with [rt_data_code_pmc()], stacked over many articles) and returns the prevalence of each transparency indicator. For each indicator it reports the number of articles assessed, the number in which the indicator was detected, the apparent prevalence and its Wilson confidence interval and, optionally, a prevalence corrected for the detector's sensitivity and specificity (the Rogan-Gladen estimator).
Usage
rt_summary(
data,
indicators = NULL,
by = NULL,
adjust = TRUE,
accuracy = NULL,
conf_level = 0.95
)Arguments
- data
A data frame with one row per article. Indicator columns must be logical or numeric 0/1 and named as in [rt_all_pmc()]: `is_coi_pred`, `is_fund_pred`, `is_register_pred`, `is_open_data`, `is_open_code`, `is_novelty_pred`, `is_replication_pred`, `is_ai_pred`, `is_open_access` and `is_reporting_pred`. `NA` marks an article that was not assessed for that indicator (for example `is_ai_pred` before 2023) and is excluded from its denominator. Other values are rejected rather than silently coerced.
- indicators
Optional character vector of indicator columns to summarize. Defaults to every recognized indicator present in `data`.
- by
Optional name of a grouping column (for example a publication year, journal or article type); the summary is then computed within each group.
- adjust
If `TRUE` (default), add a prevalence corrected for detector sensitivity and specificity using `accuracy`. Indicators absent from `accuracy` receive `NA` corrected values.
- accuracy
A data frame of detector accuracy with columns `variable`, `sensitivity` and `specificity`. Defaults to [rt_accuracy].
- conf_level
Confidence level for the intervals (default `0.95`).
Value
A tibble with one row per indicator (per group, if `by` is given): the grouping column (when `by` is used), `indicator`, `label`, `n_articles`, `n_detected`, `percent`, `conf_low`, `conf_high` and, when `adjust = TRUE`, `adj_percent`, `adj_low` and `adj_high`. Percentages and interval bounds are on the 0-100 scale.
Examples
data(rt_demo)
rt_summary(rt_demo)
#> # A tibble: 8 × 10
#> indicator label n_articles n_detected percent conf_low conf_high adj_percent
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 is_coi_pred Conf… 1200 845 70.4 67.8 72.9 70.8
#> 2 is_fund_pr… Fund… 1200 955 79.6 77.2 81.8 79.4
#> 3 is_registe… Prot… 1200 356 29.7 27.2 32.3 30.8
#> 4 is_open_da… Data… 1200 245 20.4 18.2 22.8 25.7
#> 5 is_open_co… Code… 1200 102 8.5 7.05 10.2 9.13
#> 6 is_novelty… Nove… 1200 653 54.4 51.6 57.2 62.8
#> 7 is_replica… Repl… 1200 113 9.42 7.89 11.2 8.67
#> 8 is_ai_pred AI d… 282 71 25.2 20.5 30.6 NA
#> # ℹ 2 more variables: adj_low <dbl>, adj_high <dbl>
# Apparent prevalence only, no accuracy correction
rt_summary(rt_demo, adjust = FALSE)
#> # A tibble: 8 × 7
#> indicator label n_articles n_detected percent conf_low conf_high
#> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 is_coi_pred Conflict… 1200 845 70.4 67.8 72.9
#> 2 is_fund_pred Funding … 1200 955 79.6 77.2 81.8
#> 3 is_register_pred Protocol… 1200 356 29.7 27.2 32.3
#> 4 is_open_data Data sha… 1200 245 20.4 18.2 22.8
#> 5 is_open_code Code sha… 1200 102 8.5 7.05 10.2
#> 6 is_novelty_pred Novelty 1200 653 54.4 51.6 57.2
#> 7 is_replication_pred Replicat… 1200 113 9.42 7.89 11.2
#> 8 is_ai_pred AI discl… 282 71 25.2 20.5 30.6
# By article type
rt_summary(rt_demo, by = "type")
#> # A tibble: 24 × 11
#> type indicator label n_articles n_detected percent conf_low conf_high
#> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
#> 1 review-arti… is_coi_p… Conf… 241 174 72.2 66.2 77.5
#> 2 review-arti… is_fund_… Fund… 241 192 79.7 74.1 84.3
#> 3 review-arti… is_regis… Prot… 241 81 33.6 27.9 39.8
#> 4 review-arti… is_open_… Data… 241 47 19.5 15.0 25.0
#> 5 review-arti… is_open_… Code… 241 21 8.71 5.77 13.0
#> 6 review-arti… is_novel… Nove… 241 120 49.8 43.5 56.1
#> 7 review-arti… is_repli… Repl… 241 22 9.13 6.11 13.4
#> 8 review-arti… is_ai_pr… AI d… 66 14 21.2 13.1 32.5
#> 9 systematic-… is_coi_p… Conf… 132 82 62.1 53.6 69.9
#> 10 systematic-… is_fund_… Fund… 132 109 82.6 75.2 88.1
#> # ℹ 14 more rows
#> # ℹ 3 more variables: adj_percent <dbl>, adj_low <dbl>, adj_high <dbl>