There are many ways to import data into R, depending on the format that your data is in. In general, when importing data you have to specify a directory that it pulls from. By default, R assumes the directory you want is the current working directory, or project. Though you are able to change it to a location on your local machine, this is not recommended for code which will end up on GitHub, because then others will have to modify that line so it works properly on their machine. You can check your directory with the following:
getwd()
## [1] "C:/Users/eckha/Desktop/Project-Template"
This looks fine for the time being. Its odd that it decided to use the “Supporting Files” folder, but we can fix that using a special operator “..”. The “..” operator will set the working directory one folder back. In this case, it sets it from “~Project-Template/Supporting Files” to “~Project-Template”.
setwd("..")
getwd()
## [1] "C:/Users/eckha/Desktop"
So now the working directory is set to the project folder itself. This will make using relative paths easier. Unlike absolute paths, such as the outputs for getwd(), relative paths look for the folder or file based on where it is rather than the root. This is helpful for code sharing as relative paths should work on any machine as long as the working directory is the same.
As mentioned above, there are many formats to import data into R. The simplest form, at least in my opinion, is using a .csv file. Unlike excel files, .csv files are in a simple format which translates better into R’s language. It also means that the .csv file can be created from an excel file or turned into an excel file for those who are less R savvy. To import data, and make sure it is reproducible, we can create an object to specify a relative path and to specify a file name. In this case, we’ll play with the mtcars dataset. We will import the .csv file which is stored in the “Data” folder.
#loading necessary packages
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## Warning: package 'ggplot2' was built under R version 4.3.3
## Warning: package 'purrr' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#specifying a relative directory our data is stored in
dir_path <- "Data/"
#getting the file name saved
name_of_file <- "mtcars-test-data.csv"
#using read_csv to import it. Paste0 will concatenate the two together
df.cars <- read_csv(paste0(dir_path, name_of_file))
## New names:
## Rows: 32 Columns: 12
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (11): mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
#Checking the first few rows of the data
head(df.cars)
## # A tibble: 6 × 12
## ...1 mpg cyl disp hp drat wt qsec vs am gear carb
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 Mazda RX4 W… 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 Hornet 4 Dr… 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 Hornet Spor… 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1