This is a short note about reading tables from common office document formats into R.
xls and xlsx
After problems with large worksheets I have turned to using the readxl package.
require(readxl) sheet_names <- excel_sheets("my_file.xls") d <- read_excel("my_file.xls", sheet = 1, region = "A3:C7", col_names = TRUE)
Recently I had to save a doc file in the docx format in order to be able to extract a table. Extracting tables from docx works like this:
require(docxtractr) docx <- read_docx("my_file.docx") tables <- docx_extract_all_tbls(docx)