Generate an observation period table

Description

Generates a temporary observation period table based the first and last event in the electronic medical record data. Because some EHR sites have contributed data from several decades ago, researchers might want to consider further constraining this table to reasonable date ranges of interest (e.g., setting all observation_period_start_date values to no earlier than 01/01/2010).

Usage

aou_observation_period(
  cohort = NULL,
  collect = FALSE,
  ...,
  con = getOption("aou.default.con")
)

Arguments

cohort Reference to a remote table or local dataframe with a column called "person_id"
collect Whether to bring the resulting table into local memory (collect = TRUE) as a dataframe or leave as a reference to a database table (for continued analysis using, e.g., dbplyr). Defaults to FALSE.
Further arguments passed along to collect() if collect = TRUE
con Connection to the allofus SQL database. Defaults to getOption("aou.default.con"), which is set automatically if you use aou_connect()

Details

[Experimental]

The current observation period table in the All of Us OMOP CDM is not always appropriate for cohorts generated using OHDSI tools such as ATLAS. Some observation periods are overly short and some participants have hundreds of observation periods.

This function generates an observation period table from the first occurrence of a clinical event in the EHR tables to the last clinical event in the EHR tables. It will only return a single observation period per person_id in the database. If collect = FALSE, the function returns a query to a temporary table in the database which can be referenced by typical dplyr functions.

Normal OMOP conventions for EHR suggest that long lapses of time between clinical events may indicate that the person was not "observed" during this period. However, due to the diverse nature of clinical EHR data contributed to All of Us, it seems most conservative to assume that the person was observed from their first to last clinical event. See https://ohdsi.github.io/CommonDataModel/ehrObsPeriods.html for more details.

Some users have clinical events going back to before the time of widespread electronic medical record use (e.g., the 1980s and 1990s). This function considers all EHR data in the database, regardless of the date of the clinical event, but we recommend that users consider the implications of including data from the 1980s and 1990s. It may be more prudent to exclude data prior to a more recent cutoff date so that the EHR data is more likely to be accurate, though this decision depends highly on the research question (see example below).

Users should note that the aou_observation_period function will only generate observation periods for participants who have at least one clinical observation. If participant in the AllofUs research program who did not include electronic health record data are included in the cohort argument, or elected to contribute data but have no data to contribute, they will not be included in the generated observation period table.

Value

A dataframe if collect = TRUE; a reference to a remote database table if not. Columns will be "person_id", "observation_period_start_date", and "observation_period_end_date".

Examples

library(allofus)



library(dplyr)
con <- aou_connect()

# create observation_period table for everyone
observation_period_tbl <- aou_observation_period()

# create a cohort of participants with EHR data and at least one year
# of observation before they took the first survey

# first, create an index date as the first date a survey was taken
index_date_tbl <- tbl(con, "ds_survey") %>%
  group_by(person_id) %>%
  summarize(index_date = as.Date(min(survey_datetime, na.rm = TRUE)),
            .groups = "drop")

# join with observation_period_tbl
cohort <- tbl(con, "cb_search_person") %>%
  filter(has_ehr_data == 1) %>%
  inner_join(index_date_tbl, by = "person_id") %>%
  inner_join(observation_period_tbl, by = "person_id") %>%
  filter(
    observation_period_start_date <= DATE_ADD(
      index_date,
      sql(paste0("INTERVAL ", -1, " year"))
    ),
    index_date <= observation_period_end_date
  ) %>%
  select(person_id, gender, sex_at_birth,
    race, ethnicity, age_at_consent,
    index_date, observation_period_start_date, observation_period_end_date)

# head(cohort)

# create an observation period table with a minimum start date (e.g., 2010-01-01)
# to only look at EHR data after that date
observation_period_tbl %>%
  mutate(
    observation_period_start_date =
      if_else(observation_period_start_date < as.Date("2010-01-01"),
        as.Date("2010-01-01"),
        observation_period_start_date
      )
  ) %>%
  filter(observation_period_end_date > as.Date("2010-01-01"))