Since there may be many gaps in mobile sensing data, it is pivotal to pay attention to them in
the analysis. This function adds known gaps to data as "measurements", thereby allowing easier
calculations for, for example, finding the duration. For instance, consider a participant spent
30 minutes walking. However, if it is known there is gap of 15 minutes in this interval, we
should somehow account for it. add_gaps accounts for this by adding the gap data to
sensors data by splitting intervals where gaps occur.
Arguments
- data
A data frame containing the data. See
get_data()for retrieving data from an mpathsenser database.- gaps
A data frame (extension) containing the gap data. See
identify_gaps()for retrieving gap data from an mpathsenser database. It should at least contain the columnsfromandto(both in a date-time format), as well as any specified columns inby.- by
A character vector indicating the variable(s) to match by, typically the participant IDs. If NULL, the default,
*_join()will perform a natural join, using all variables in common acrossx andy`.- continue
Whether to continue the measurement(s) prior to the gap once the gap ends.
- fill
A named list of the columns to fill with default values for the extra measurements that are added because of the gaps.
Details
In the example of 30 minutes walking where a 15 minute gap occurred (say after 5
minutes), add_gaps() adds two rows: one after 5 minutes of the start of the interval
indicating the start of the gap(if needed containing values from fill), and one after 20
minutes of the start of the interval signalling the walking activity. Then, when calculating
time differences between subsequent measurements, the gap period is appropriately accounted
for. Note that if multiple measurements occurred before the gap, they will both be continued
after the gap.
Warning
Depending on the sensor that is used to identify the gaps (though this is
typically the highest frequency sensor, such as the accelerometer or gyroscope), there may be a
small delay between the start of the gap and the actual start of the gap. For example, if the
accelerometer samples every 5 seconds, it may be after 4.99 seconds after the last
accelerometer measurement (so just before the next measurement), the app was killed. However,
within that time other measurements may still have taken place, thereby technically occurring
"within" the gap. This is especially important if you want to use these gaps in
add_gaps since this issue may lead to erroneous results.
An easy way to solve this problem is by taking into account all the sensors of interest when identifying the gaps, thereby ensuring there are no measurements of these sensors within the gap. One way to account for this is to (as in this example) search for gaps 5 seconds longer than you want and then afterwards increasing the start time of the gaps by 5 seconds.
See also
identify_gaps() for finding gaps in the sampling; link_gaps() for linking gaps to
ESM data, analogous to link().
Examples
# Define some data
dat <- data.frame(
participant_id = "12345",
time = as.POSIXct(c("2022-05-10 10:00:00", "2022-05-10 10:30:00", "2022-05-10 11:30:00")),
type = c("WALKING", "STILL", "RUNNING"),
confidence = c(80, 100, 20)
)
# Get the gaps from identify_gaps, but in this example define them ourselves
gaps <- data.frame(
participant_id = "12345",
from = as.POSIXct(c("2022-05-10 10:05:00", "2022-05-10 10:50:00")),
to = as.POSIXct(c("2022-05-10 10:20:00", "2022-05-10 11:10:00"))
)
# Now add the gaps to the data
add_gaps(
data = dat,
gaps = gaps,
by = "participant_id"
)
#> # A tibble: 5 × 4
#> participant_id time type confidence
#> <chr> <dttm> <chr> <dbl>
#> 1 12345 2022-05-10 10:00:00 WALKING 80
#> 2 12345 2022-05-10 10:05:00 NA NA
#> 3 12345 2022-05-10 10:30:00 STILL 100
#> 4 12345 2022-05-10 10:50:00 NA NA
#> 5 12345 2022-05-10 11:30:00 RUNNING 20
# You can use fill if you want to get rid of those pesky NA's
add_gaps(
data = dat,
gaps = gaps,
by = "participant_id",
fill = list(type = "GAP", confidence = 100)
)
#> # A tibble: 5 × 4
#> participant_id time type confidence
#> <chr> <dttm> <chr> <dbl>
#> 1 12345 2022-05-10 10:00:00 WALKING 80
#> 2 12345 2022-05-10 10:05:00 GAP 100
#> 3 12345 2022-05-10 10:30:00 STILL 100
#> 4 12345 2022-05-10 10:50:00 GAP 100
#> 5 12345 2022-05-10 11:30:00 RUNNING 20
