Produces a range of descriptive statistics in a useful format

Whilst dsBaseClient provides the functions 'ds.table' and 'ds.summary' to calculate descriptive statistics, their output is not in a very useable format. This function extracts key descriptive statistics and returns them in tibble.

Usage

dh.getStats(df = NULL, vars = NULL, digits = 2, conns = NULL, checks = TRUE)

Arguments

df: Character specifying a server-side data frame.
vars: Character vector of columns within df to summarise.
digits: Optionally, the number of decimal places to round output to. Default is 2.
conns: DataSHIELD connections object.
checks: Logical; if TRUE checks are performed prior to running the function. Default is TRUE.

Value

A client-side list with two elements: "categorical" and "continuous". Each element contains a tibble with descriptive statistics as follows.

Categorical:

"variable" = Variable name.
cohort = Cohort name, where "combined" refers to pooled statistics.
category = Level of variable, including 'missing' as a category.
value = Number of observations within category.
cohort_n = Total number of observations per cohort within df.
valid_n = Number of valid observations for variable (sum of ns for all categories excluding missing).
missing_n = Number of missing observations.
perc_valid = Numnber of observations within a category as percentage of valid_n.
perc_total = Number of observations within a category as percentage of cohort_n.

Continuous:

variable = As above.
cohort = As above.
mean = Mean. The pooled value calculated by fixed-effect meta-analysis.
std.dev = Standard deviation. The pooled value is also calculate by fixed-
effect meta-analysis.
perc_5, perc_10, perc_25, perc_50, perc_75, perc_90, perc_95 = 5th to 95th
percentile values.
valid_n = As above.
cohort_n = As above.
missing_n = As above.
missing_perc = Percentage of observations missing.

Details

This function also overcomes an issue with ds.summary, where it throws an error if the variable is missing in one or more study. By contrast, dh.getStats will return the variable for that cohort with all NAs. See 'value' for details of returned statistics.

Produces a range of descriptive statistics in a useful format

Usage

Arguments

Value

Details

See also