Consider two tables below. Which would be easier to work with?
Table 1
## # A tibble: 10 x 5
## country year pr cl status
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 Afghanistan 1973 4 3 Partly Free
## 2 Afghanistan 1974 1 2 Not Free
## 3 Afghanistan 1975 1 2 Not Free
## 4 Afghanistan 1976 1 2 Not Free
## 5 Afghanistan 1977 1 2 Not Free
## 6 Afghanistan 1978 2 2 Not Free
## 7 Afghanistan 1979 1 1 Not Free
## 8 Afghanistan 1980 1 1 Not Free
## 9 Afghanistan 1981 1 1 Not Free
## 10 Afghanistan 1982 1 1 Not Free
Table 2
## # A tibble: 10 x 6
## `Year(s) Under Review` `1972` ...3 ...4 `1973` ...6
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 <NA> PR CL Status PR CL
## 2 Afghanistan 4 5 PF 7 6
## 3 Albania 7 7 NF 7 7
## 4 Algeria 6 6 NF 6 6
## 5 Andorra 4 3 PF 4 4
## 6 Angola <NA> <NA> <NA> <NA> <NA>
## 7 Antigua and Barbuda <NA> <NA> <NA> <NA> <NA>
## 8 Argentina 6 3 PF 2 2
## 9 Armenia <NA> <NA> <NA> <NA> <NA>
## 10 Australia 1 1 F 1 1
Hopefully it’s clear that Table 1 is easier to work with. But why exactly is this? And how can messy Table 2 be made to look like clean Table 1? In this presentation I take a look at techniques for tidying data.
Recommended Readings:
- Chapter 12 in R for Data Science, by Hadley Wickham and Garrett Grolemund