This is a short note about reading tables from common office document formats into R.

xls and xlsx

I have had good success with the XLConnect package, both with the old xls format and the XML based xlsx format. The key function for me is readWorksheet.

wb <- loadWorkbook("my_file.xls")
s1 <- readWorksheet(wb, sheet = "Sheet 1", region = "A3:C7", header = TRUE)


Recently I had to save a doc file in the docx format in order to be able to extract a table. Extracting tables from docx works like this:

docx <- docxtractr::read_docx("my_file.docx")
tables <- docx_extract_all_tbls(docx)