Behind the scenes of tidyverse development: new dplyr functions | Davis Vaughan | Data Science Lab
The Data Science Lab is a live weekly call. Register at pos.it/dslab! Discord invites go out each week on lives calls. We’d love to have you!
The Lab is an open, messy space for learning and asking questions. Think of it like pair coding with a friend or two. Learn something new, and share what you know to help others grow.
On this call, Libby Heeren is joined by Davis Vaughan, Positron and tidyverse dev, who walks through the newest functions in dplyr, the problems they solve, how decisions are made to add or replace functions, and how community-driven development drives the tidyverse (and how wife-driven development drives Davis!).
Hosting crew from Posit: Libby Heeren, Isabella Velasquez, Davis Vaughan
Davis’s Bluesky: https://bsky.app/profile/davisvaughan.bsky.social Davis’s LinkedIn: https://www.linkedin.com/in/davis-vaughan/ Davis’s GitHub: https://github.com/DavisVaughan Resources mentioned in the video and chat: Tidyverse (R package collection) → https://www.tidyverse.org dplyr (R package) → https://dplyr.tidyverse.org tidyr (R package) → https://tidyr.tidyverse.org Air (R formatter) → https://github.com/DavisVaughan/air Tidy-ups (Tidyverse GitHub Repo) → https://github.com/tidyverse/tidyups Python Enhancement Proposals (PEPs) → https://peps.python.org dbplyr (R package) → https://dbplyr.tidyverse.org Lifecycle meanings → clock (R package) → https://clock.r-lib.org Advanced R (Book by Hadley Wickham) → https://adv-r.hadley.nz duckplyr (R package) → https://duckplyr.tidyverse.org R devel notes about %notin% being added to R natively soon → https://stat.ethz.ch/R-manual/R-devel/doc/html/NEWS.html ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh The Lab: https://pos.it/dslab LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us!
Timestamps of Questions / Topics: 00:00 Introduction 02:01 Demo of why we needed the new recode_values function and what it does 08:49 “What all was going on behind the scenes that came together for you to say… it’s definitely time to take action?” 10:19 Demo showing how to thumbs-up an issue on GitHub 11:55 “Could you show us how packages like dplyr are created” 12:31 “So if we have a data dictionary, we could use recode_values with across() to apply the right values to the right question?” 14:07 “Why dplyr 1.2 as opposed to 2.0?” 15:15 “Can you talk a little bit about what a tidyup means?” 19:08 Demo of filter_out 23:51 “I didn’t catch that anonymous formula notation in the last shared screen. What was that for?” 24:32 “What’s the difference between a comma and an and?” 26:17 “Did Libby say wife-driven development?!” (She did) 29:08 “Thoughts on incorporating the power of dplyr into Snowflake pipelines for data transformations.” 30:58 Explaining the anonymous notation and what.x means 33:48 “Can we touch on deprecation?” and the difference between deprecating and superseding 39:03 “Is there an internal rule of thumb on how long before something goes from experimental to stable?” 40:10 “Why would we need a replace_values?” (and replace_values demo) 40:35 “How often do you have these wife-driven coding moments?” 41:17 “What exactly is a gist?” (feel free to roast Libby for how she says gist) 46:58 “Is there a list of packages or functions that are going to be archived so I can get a little bit more lead time if this ever happens again?” 49:02 “Could coalesce() be used on multiple columns?” 50:42 “Why do I keep seeing this dot dot dot as an argument option and what does that mean?” 54:22 “Any chance of duckplyr going mainstream and included in dplyr?”
Davis Vaughan, Hadley Wickham
air
dbplyr
dplyr
duckplyr
Positron
tidyr
tidyups
tidyverse
tidyverse.org