Gaps in mobile sensing data typically occur when the app is stopped by the operating system or
the user. While small gaps may not pose problems with analyses, greater gaps may cause bias or
skew your data. As a result, gap data should be considered in order to inspect and limit their
influence. This function, analogous to link(), allows you to connect gaps to other data
(usually ESM/EMA data) within a user-specified time range.
Arguments
- data
A data frame or an extension to a data frame (e.g. a tibble). While gap data can be linked to any other type of data, ESM data is most commonly used.
- gaps
A data frame (extension) containing the gap data. See
identify_gaps()for retrieving gap data from an mpathsenser database. It should at least contain the columnsfromandto(both in a date-time format), as well as any specified columns inby.- by
A character vector indicating the variable(s) to match by, typically the participant IDs. If NULL, the default,
*_join()will perform a natural join, using all variables in common acrossxandy. Therefore, all data will be mapped to each other based on the time stamps ofxandy. A message lists the variables so that you can check they're correct; suppress the message by supplying by explicitly.To join by different variables on
xandy, use a named vector. For example,by = c('a' = 'b')will matchx$atoy$b.To join by multiple variables, use a vector with
length > 1. For example,by = c('a', 'b')will matchx$atoy$aandx$btoy$b. Use a named vector to match different variables inxandy. For example,by = c('a' = 'b', 'c' = 'd')will matchx$atoy$bandx$ctoy$d.To perform a cross-join (when
xandyhave no variables in common), useby = character(). Note that thesplitargument will then be set to 1.- offset_before
The time before each measurement in
xthat denotes the period in whichyis matched. Must be convertible to a period bylubridate::as.period().- offset_after
The time after each measurement in
xthat denotes the period in whichyis matched. Must be convertible to a period bylubridate::as.period().- raw_data
Whether to include the raw data (i.e. the matched gap data) to the output as gap_data.
Value
The original data with an extra column duration indicating the gap during within the
interval in seconds (if duration is TRUE), or an extra column called gap_data containing
the gaps within the interval. The function ensures all durations and gap time stamps are within
the range of the interval.
See also
bin_data() for linking two sets of intervals to each other; identify_gaps() for
finding gaps in the sampling; add_gaps() for adding gaps to sensor data;
Examples
# Create some data
x <- data.frame(
time = rep(seq.POSIXt(as.POSIXct("2021-11-14 13:00:00"), by = "1 hour", length.out = 3), 2),
participant_id = c(rep("12345", 3), rep("23456", 3)),
item_one = rep(c(40, 50, 60), 2)
)
# Create some gaps
gaps <- data.frame(
from = as.POSIXct(c("2021-11-14 13:00:00", "2021-11-14 14:00:00")),
to = as.POSIXct(c("2021-11-14 13:30:00", "2021-11-14 14:30:00")),
participant_id = c("12345", "23456")
)
# Link the gaps to the data
link_gaps(x, gaps, by = "participant_id", offset_before = 0, offset_after = 1800)
#> # A tibble: 6 × 4
#> time participant_id item_one gap
#> <dttm> <chr> <dbl> <dbl>
#> 1 2021-11-14 13:00:00 12345 40 1800
#> 2 2021-11-14 14:00:00 12345 50 0
#> 3 2021-11-14 15:00:00 12345 60 0
#> 4 2021-11-14 13:00:00 23456 40 0
#> 5 2021-11-14 14:00:00 23456 50 1800
#> 6 2021-11-14 15:00:00 23456 60 0
# Link the gaps to the data and include the raw data
link_gaps(
x,
gaps,
by = "participant_id",
offset_before = 0,
offset_after = 1800,
raw_data = TRUE
)
#> # A tibble: 6 × 5
#> time participant_id item_one gap_data gap
#> <dttm> <chr> <dbl> <list> <dbl>
#> 1 2021-11-14 13:00:00 12345 40 <tibble [1 × 3]> 1800
#> 2 2021-11-14 14:00:00 12345 50 <tibble [0 × 3]> 0
#> 3 2021-11-14 15:00:00 12345 60 <tibble [0 × 3]> 0
#> 4 2021-11-14 13:00:00 23456 40 <tibble [0 × 3]> 0
#> 5 2021-11-14 14:00:00 23456 50 <tibble [1 × 3]> 1800
#> 6 2021-11-14 15:00:00 23456 60 <tibble [0 × 3]> 0
