Tidy evaluation

Tidy evaluation… extended!

This is an extension from a post I saw here. There are several instances when I want to be able to use unquoted variable names in a function to generate ouputs without having to quote the variable names. There are also instances when I need to use the quoted variable names. We’ll walk through some examples and how we need to set up code to do that.

Using unquoted variable names in function

So let’s say we want to create a function that will generate a frequency table. Let’s use {janitor} and {dplyr}.

library('janitor')
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library('dplyr')
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

We’ll use the mtcars dataframe for illustrative purposes. If we wanted to create a frequency table of some of the variables in mtcars, we would need to use the below code.

mtcars %>%
  tabyl(cyl) %>%
  adorn_pct_formatting(digits = 0) %>%
  adorn_totals()
##    cyl  n percent
##      4 11     34%
##      6  7     22%
##      8 14     44%
##  Total 32       -

Now let’s say we don’t want to be writing the same code over and over again for checking several variables in your dataset. We want to create a function that will take a variable name and will then create the same output.

tabyl_function_wrong <- function(df, var_name){
  
  df %>%
    tabyl(var_name) %>%
    adorn_pct_formatting(digits = 0) %>%
    adorn_totals()
  
}

Now if we tried to use the function above it wouldn’t work. Let’s see it in action.

tabyl_function_wrong(mtcars, cyl)
## Error: object 'cyl' not found

As you can see, R isn’t able to find cyl in the current environment. We don’t want R to look through the current environment though. We want it to look inside the function’s environment. The enquo() function will essentially create that and we just need to use the forcing operator !! later in the function expression so that we are correctly inputting the variable name that we want in the function.

tabyl_function <- function(df, var_name){
  
  var_name <- enquo(var_name)
  
  df %>%
    tabyl(!!var_name) %>%
    adorn_pct_formatting(digits = 0) %>%
    adorn_totals()
  
}

Now if we tried to use the above function it should work. Let’s see it in action.

tabyl_function(mtcars, cyl)
##    cyl  n percent
##      4 11     34%
##      6  7     22%
##      8 14     44%
##  Total 32       -

Using quoted variable names in function

Now let’s say we want to make concatenate several tables based on quoted variable names… This time we need to use sym() instead of enquo() so that the quote variable can turn into a symbol that is evaluated as an object within the function environment that contains the dataset mtcars therefore will be identified as a column name within mtcars.

We’re going to also use {purrr} so that we can use tidy-style apply syntax to generate descriptive statistics.

library(purrr)

c("cyl","gear","carb") %>%
  map_dfr(function(x){
    
    varname = x
    x <- sym(x)
    
    mtcars %>%
      tabyl(!!x,vs)  %>%
      adorn_percentages() %>%
      adorn_pct_formatting(digits = 0) %>%
      adorn_ns(position = "rear")  %>%
      mutate(varname = varname) %>%
      rename("varval" = 1)
  })
##  varval         0        1 varname
##       4   9%  (1) 91% (10)     cyl
##       6  43%  (3) 57%  (4)     cyl
##       8 100% (14)  0%  (0)     cyl
##       3  80% (12) 20%  (3)    gear
##       4  17%  (2) 83% (10)    gear
##       5  80%  (4) 20%  (1)    gear
##       1    0% (0) 100% (7)    carb
##       2   50% (5)  50% (5)    carb
##       3  100% (3)   0% (0)    carb
##       4   80% (8)  20% (2)    carb
##       6  100% (1)   0% (0)    carb
##       8  100% (1)   0% (0)    carb

There you have it! I hope this was helpful and will be useful later on in your coding experience!

Avatar
Chong H. Kim
Health Economics & Outcomes Researcher

My research interests include health economics & outcomes research (HEOR), real-world evidence/observation research, predictive modeling, and spatial statistics.

Related