In time series with variable measurements, an often recurring task is calculating the total time spent (i.e. the duration) in fixed bins, for example per hour or day. However, this may be difficult when two subsequent measurements are in different bins or span over multiple bins.
Usage
bin_data(
data,
start_time,
end_time,
by = c("sec", "min", "hour", "day"),
fixed = TRUE,
.name = "bin"
)Arguments
- data
A data frame or tibble containing the time series.
- start_time
The column name of the start time of the interval, a POSIXt.
- end_time
The column name of the end time of the interval, a POSIXt.
- by
A binning specification.
- fixed
Whether to create fixed bins. If
TRUE, bins will be rounded to, for example, whole hours or days (depending onby). IfFALSE, bins will be created based on the first timestamp.- .name
The name of the column containing the nested data.
Value
A tibble containing the group columns (if any), date, hour (if by = "hour"), and
the duration in seconds.
See also
link_gaps() for linking gaps to data.
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = "WALKING"
)
# get bins per hour, even if the interval is longer than one hour
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
#> # A tibble: 3 × 2
#> bin bin_data
#> <dttm> <list>
#> 1 2022-06-21 15:00:00 <tibble [2 × 5]>
#> 2 2022-06-21 16:00:00 <tibble [1 × 5]>
#> 3 2022-06-21 17:00:00 <tibble [3 × 5]>
# Alternatively, you can give an integer value to by to create custom-sized
# bins, but only if fixed = FALSE. Not that these bins are not rounded to,
# as in this example 30 minutes, but rather depends on the earliest time
# in the group.
data |>
mutate(datetime = as.POSIXct(datetime)) |>
mutate(lead = lead(datetime)) |>
bin_data(
start_time = datetime,
end_time = lead,
by = 1800L,
fixed = FALSE
)
#> # A tibble: 5 × 2
#> bin bin_data
#> <dttm> <list>
#> 1 2022-06-21 15:00:00 <tibble [1 × 5]>
#> 2 2022-06-21 15:30:00 <tibble [2 × 5]>
#> 3 2022-06-21 16:00:00 <tibble [1 × 5]>
#> 4 2022-06-21 16:30:00 <tibble [1 × 5]>
#> 5 2022-06-21 17:00:00 <tibble [3 × 5]>
# More complicated data for showcasing grouping:
data <- tibble(
participant_id = 1,
datetime = c(
"2022-06-21 15:00:00", "2022-06-21 15:55:00",
"2022-06-21 17:05:00", "2022-06-21 17:10:00"
),
confidence = 100,
type = c("STILL", "WALKING", "STILL", "WALKING")
)
# binned_intervals also takes into account the prior grouping structure
out <- data |>
mutate(datetime = as.POSIXct(datetime)) |>
group_by(participant_id) |>
mutate(lead = lead(datetime)) |>
group_by(participant_id, type) |>
bin_data(
start_time = datetime,
end_time = lead,
by = "hour"
)
print(out)
#> # A tibble: 6 × 4
#> # Groups: participant_id, type [2]
#> participant_id type bin bin_data
#> <dbl> <chr> <dttm> <list>
#> 1 1 STILL 2022-06-21 15:00:00 <tibble [1 × 3]>
#> 2 1 STILL 2022-06-21 16:00:00 <tibble [0 × 3]>
#> 3 1 STILL 2022-06-21 17:00:00 <tibble [1 × 3]>
#> 4 1 WALKING 2022-06-21 15:00:00 <tibble [1 × 3]>
#> 5 1 WALKING 2022-06-21 16:00:00 <tibble [1 × 3]>
#> 6 1 WALKING 2022-06-21 17:00:00 <tibble [2 × 3]>
# To get the duration for each bin (note to change the variable names in sum):
purrr::map_dbl(
out$bin_data,
~ sum(as.double(.x$lead) - as.double(.x$datetime),
na.rm = TRUE
)
)
#> [1] 3300 0 300 300 3600 300
# Or:
out |>
tidyr::unnest(bin_data, keep_empty = TRUE) |>
mutate(duration = .data$lead - .data$datetime) |>
group_by(bin, .add = TRUE) |>
summarise(duration = sum(.data$duration, na.rm = TRUE), .groups = "drop")
#> # A tibble: 6 × 4
#> participant_id type bin duration
#> <dbl> <chr> <dttm> <drtn>
#> 1 1 STILL 2022-06-21 15:00:00 55 mins
#> 2 1 STILL 2022-06-21 16:00:00 0 mins
#> 3 1 STILL 2022-06-21 17:00:00 5 mins
#> 4 1 WALKING 2022-06-21 15:00:00 5 mins
#> 5 1 WALKING 2022-06-21 16:00:00 60 mins
#> 6 1 WALKING 2022-06-21 17:00:00 5 mins
