Apply functions to two sets of columns simultaniously in 'dplyr'

across2() and across2x() are variants of dplyr::across() that iterate over two columns simultaneously. across2() loops each pair of columns in .xcols and .ycols over one or more functions, while across2x() loops every combination between columns in .xcols and .ycols over one or more functions.

across2(.xcols, .ycols, .fns, ..., .names = NULL, .names_fn = NULL)

across2x(
  .xcols,
  .ycols,
  .fns,
  ...,
  .names = NULL,
  .names_fn = NULL,
  .comb = "all"
)

Arguments

.xcols, .ycols	<`tidy-select`> Columns to transform. Note that you can not select or compute upon grouping variables.
.fns	Functions to apply to each column in `.xcols` and `.ycols`. Possible values are: A function A purrr-style lambda A list of functions/lambdas Note that `NULL` is not accepted as argument to `.fns`.
...	Additional arguments for the function calls in `.fns`.
.names	A glue specification that describes how to name the output columns. This can use: `{xcol}` to stand for the selected column name in `.xcols`, `{ycol}` to stand for the selected column name in `.ycols`, and `{fn}` to stand for the name of the function being applied. The default (`NULL`) is equivalent to `"{xcol}_{ycol}"` for the single function case and `"{xcol}_{ycol}_{fn}"` for the case where a list is used for `.fns`. `across2()` supports two additional glue specifications: `{pre}` and `{suf}`. They extract the common alphanumeric prefix or suffix of each pair of variables. Alternatively to a glue specification, a character vector of length equal to the number of columns to be created can be supplied to `.names`. Note that in this case, the glue specification described above is not supported.
.names_fn	Optionally, a function that is applied after the glue specification in `.names` has been evaluated. This is, for example, helpful, in case the resulting names need to be further cleaned or trimmed.
.comb	In `across2x()` this argument allows to control which combinations of columns are to be created. This argument only matters, if the columns specified in `.xcols` and `.ycols` overlap to some extent. `"all"`, the default, will create all pairwise combinations between columns in `.xcols` and `.ycols` including all permutations (e.g. `foo(column_x, column_y)` as well as `foo(column_y, column_x)`. `"unique"` will only create all unordered combinations (e.g. creates `foo(column_x, column_y)`, while `foo(column_y, column_x)` will not be created) `"minimal` same as `"unique"` and further skips all self-matches (e.g. `foo(column_x, column_x)` will not be created)

Value

across2() returns a tibble with one column for each pair of elements in .xcols and .ycols combined with each function in .fns.

across2x() returns a tibble with one column for each combination between elements in .x and.y combined with each function in .fns.

Examples

For the basic functionality of across() please refer to the examples in dplyr::across().

library(dplyr)

# For better printing
iris <- as_tibble(iris)

across2() can be used to transfrom pairs of variables in one or more functions. In the example below we want to calculate the product and the sum of all pairs of 'Length' and 'Width' variables. We can use {pre} in the glue specification in .names to extract the common prefix of each pair of variables. We can further transform the names, in the example setting them tolower by specifying the .names_fn argument:

iris %>%
  transmute(across2(ends_with("Length"),
                    ends_with("Width"),
                    .fns = list(product = ~ .x * .y,
                                sum = ~ .x + .y),
                   .names = "{pre}_{fn}",
                   .names_fn = tolower))
#> # A tibble: 150 x 4
#>   sepal_product sepal_sum petal_product petal_sum
#>           <dbl>     <dbl>         <dbl>     <dbl>
#> 1          17.8       8.6          0.28       1.6
#> 2          14.7       7.9          0.28       1.6
#> 3          15.0       7.9          0.26       1.5
#> 4          14.3       7.7          0.3        1.7
#> # ... with 146 more rows

across2x() can be used to perform calculations on each combination of variables. In the example below we calculate the correlation between all variables in the iris data set for each group. To do this, we group_by 'Species' and specify the tidyselect helper everything() to .xcols and .ycols. ~ round(cor(.x, .y), 2) gives us the correlation rounded to two digits for each pair of variables. We trim the rahter long variables names by replacing "Sepal" with "S", and "Petal" with "P" in the .names_fn argument. Finally, we are not interested in correlations of the same column and want to avoid excessive reults by setting the .comb argument to "minimal".

iris %>%
  group_by(Species) %>%
  summarise(across2x(everything(),
                     everything(),
                     ~ round(cor(.x, .y), 2),
                     .names_fn = ~ gsub("Sepal", "S", .x) %>%
                                     gsub("Petal", "P", .),
                     .comb = "minimal"))
#> # A tibble: 3 x 7
#>   Species    S.Length_S.Width S.Length_P.Length S.Length_P.Width S.Width_P.Length
#>   <fct>                 <dbl>             <dbl>            <dbl>            <dbl>
#> 1 setosa                 0.74              0.27             0.28             0.18
#> 2 versicolor             0.53              0.75             0.55             0.56
#> 3 virginica              0.46              0.86             0.28             0.4 
#> # ... with 2 more variables: S.Width_P.Width <dbl>, P.Length_P.Width <dbl>