The Three P’s: Programming, Packages, and Projects
Starting To Programme in R
One of the strong points of R
(and RStudio
) is its extensive community. It is very easy to find answers to questions about functions or commands on Google (using sources like Stack Overflow, Cross Validated, Reddit, and R Bloggers) or ChatGPT. So, whenever you have a problem, a question or a doubt, try these tools to find the answer!
Moreover, R
contains a help repository about all the functions. You can use the help()
function to access the supporting material related to a specific function or package. You can also find a help tab in the bottom-right corner of RStudio
. Another very useful resource is represented by the R
Cheatsheets. Those are compacted documents with some of the most useful functions to work in R in different topics and applications: https://posit.co/resources/cheatsheets/?type=posit-cheatsheets&_page=1/
One problem shared by these resources in general is that they require you to know what you are talking about when describing R
-related queries and problems. From the community forum angle, this is important because this community is so diverse in its expertise and background that oftentimes, different levels of R knowledge are assumed, and different R conventions are employed. Other times however, it is simply because you can describe the same problem in one hundred different ways, but some ways are better, and more efficient, than others. Another problem not really faced by the forums but very much relevant to OpenAI is the varying quality of solutions. Answers in forums are judged by the community while answers in OpenAI are judged by you, the person who has the problem to begin with.
To competently navigate this landscape therefore, you need to (roughly) know what you are looking for, and in this case that basically means learning how to “read code”. To read code, you must either gain extensive experience in working with code (i.e., through research), or you must study coding in some capacity. It is worth pointing out that I believe that the best way to learn how to do statistical analyses and to code is to practice. Studying these resources will only get you so far, and in our experience, doing statistics and using R
in real life is the best way to develop proficiency. In this way, learning statistics and R
is quite literally like learning a new language.
R Projects
As your work in R
grows, it becomes important to organize your files and maintain reproducibility. This is where R
Projects come in. R
projects are organized workflows within the R programming environment, designed to manage code, data, and outputs in a structured way.
For those of you with experience using other statistical software, such as STATA
, SPSS
, or GIS
, the concept of a “working directory” may sound familiar. A working directory is what R
assumes to be your workspace, and when looking for files to import, or when exporting files, the first place it will go to is this directory. By default, RStudio
will pick the user-directory on your computer or the directory “My Documents” as this workspace, so this is what you usually see in the Files tab the first time you open RStudio
.
Projects are basically R
’s equivalent of a “(working) directory”. This is where R
looks for files that you ask it to load, and where it will put any files that you ask it to save. Indeed, keeping all the files associated with a given project (input data, R
scripts, analytical results, and figures) together in one directory is such a wise and common practice that RStudio has built-in support for this via projects.
Each project has its own R
session, helping to prevent objects from other projects from interfering. For more advanced users, R
projects integrate seamlessly with Git, allowing you to track changes over time. This is actually how I keep this website and these blog posts up-to-date.
To create a new project, you can navigate to the top-right corner of RStudio
(below the red X and above the global environment). From there, you can either open an existing project, or create a new one. For creating a new project, you will normally go through the following steps:
Click
File > New Project
OR ClickProject: (None) > New Project
If you have not made a dedicated folder already for your project, click
New Directory
. If you have made a dedicated folder already, clickExisting Directory
. More advanced users to integrateVersion Control
should they wish to do so, but I will not cover that here.If you clicked
New Directory
in 2., you need to decide what type of project you will make. The most straightforward is the first option (New Project
), but you can also clickQuarto Project
if you intend to use quarto files as the main source of analysis.If you clicked
Existing Directory
in 2., you need to tellR
where the folder you want yourR
project to live in is located and what it is called. You need to do this soR
knows where to save all the relevant project files.Following from 3a, you need to give your directory a name. This will become the name of the folder which stores your
R
project. The box belowDirectory Name
allows you to tellR
where you want this folder to be stored.
Projects provide a dedicated folder structure for storing all files related to specific pieces of analysis. R
projects help keep files organized, but also make it easier to collaborate and share your work. They do this by providing users with consistent links and file paths, and therefore simplify reproducibility and portability across different systems. Thus, when you create a Project File, the working directory for that project will automatically be set to wherever you saved this file. Once created, you should see that instead of saying Project: (None), that icon on the top-right corner now displays the project name.
Each project usually contains a certain number of files and has a particular structure. Part of this structure can be intuited through the cover picture of this post, but some more specific details on the files you need to keep together are provided below:
- An R Project File (
.Rproj
) - A
Rproj.user
folder - A
Rhistory
source file - Some file for storing your code/analysis (e.g., an R Script (
.R
) or a Quarto file (.qmd
)) - An R Workspace
The folder that contains 1), 2), and 3) is the ecosystem of your analysis; this is your project directory. Anything outside of this folder will not be included by default, nor will you be able to work with it in R
– any file you want to include in your analysis must be in this project directory. 4) is the file you will use to conduct your analysis. That is, the file where you can insert code, generate tables and graphs, and produce statistical outputs. 5) is the dataset you will use throughout the research.
If you are a student in my University of Groningen courses, you do not need to create R Projects from scratch. Nor will you have to create Quarto documents and independently source data sets. Rather, we do all that work for you. This means that you will have everything you need to get started on your assignments right away, meaning you can open the Quarto document right away and begin working. Once you leave this course however, things won’t be so easy, so it is important that you learn how these projects and files work. The best way to do this is to practice by making your own and experimenting with them, but here are some useful resources you can use to learn more about them:
Working with Scripts and Projects: https://r4ds.hadley.nz/workflow-scripts.html
Using R Projects: https://support.posit.co/hc/en-us/articles/200526207-Using-RStudio-Projects
Using R Projects: https://www.youtube.com/watch?v=MdTtTN8PUqU