Skip to contents

Whilst dsBaseClient provides the functions 'ds.table' and 'ds.summary' to calculate descriptive statistics, their output is not in a very useable format. This function extracts key descriptive statistics and returns them in tibble.

Usage

dh.getStats(df = NULL, vars = NULL, digits = 2, conns = NULL, checks = TRUE)

Arguments

df

Character specifying a server-side data frame.

vars

Character vector of columns within df to summarise.

digits

Optionally, the number of decimal places to round output to. Default is 2.

conns

DataSHIELD connections object.

checks

Logical; if TRUE checks are performed prior to running the function. Default is TRUE.

Value

A client-side list with two elements: "categorical" and "continuous". Each element contains a tibble with descriptive statistics as follows.

Categorical:

  • "variable" = Variable name.

  • cohort = Cohort name, where "combined" refers to pooled statistics.

  • category = Level of variable, including 'missing' as a category.

  • value = Number of observations within category.

  • cohort_n = Total number of observations per cohort within df.

  • valid_n = Number of valid observations for variable (sum of ns for all categories excluding missing).

  • missing_n = Number of missing observations.

  • perc_valid = Numnber of observations within a category as percentage of valid_n.

  • perc_total = Number of observations within a category as percentage of cohort_n.

Continuous:

  • variable = As above.

  • cohort = As above.

  • mean = Mean. The pooled value calculated by fixed-effect meta-analysis.

  • std.dev = Standard deviation. The pooled value is also calculate by fixed-

  • effect meta-analysis.

  • perc_5, perc_10, perc_25, perc_50, perc_75, perc_90, perc_95 = 5th to 95th

  • percentile values.

  • valid_n = As above.

  • cohort_n = As above.

  • missing_n = As above.

  • missing_perc = Percentage of observations missing.

Details

This function also overcomes an issue with ds.summary, where it throws an error if the variable is missing in one or more study. By contrast, dh.getStats will return the variable for that cohort with all NAs. See 'value' for details of returned statistics.

See also

Other descriptive functions: dh.anyData(), dh.classDiscrepancy(), dh.createTableOne(), dh.lmTab(), dh.meanByGroup()