Use Page Ranges in PyPDF2 PdfFileMerger
PdfFileMerger is a nice Python class provided by the PyPDF package. If you want to catenate multiple pages using expressions you can use a Page Range expression.
To specify page ranges, you need to import the PageRange class
from PyPDF2 import PdfFileMerger, PageRange
then you can specify the range as a PageRange() object. For example, the range from page 13 (REMEMBER: page indices start with zero.!) to the last one can be written:
merger.append(inpdf, pages=PageRange('13:-1'))
More page range expression examples follows, if you want to play a bit with them:
: all pages. -1 last page. 22 just the 23rd page. :-1 all but the last page. 0:3 the first three pages. -2 second-to-last page. :3 the first three pages. -2: last two pages. 5: from the sixth page onward. -3:-1 third & second to last.
The third, “stride” or “step” number is also recognized.
::2 0 2 4 ... to the end. 3:0:-1 3 2 1 but not 0. 1:10:2 1 3 5 7 9 2::-1 2 1 0. ::-1 all pages in reverse order.
See below for a complete working python script with PyPDF2 and page ranges. Just set the in/out filenames, the starting/ending page numbers and run the script:
from PyPDF2 import PdfFileMerger, PdfFileReader, PageRange infn = "infilename.pdf" outfn = "outfilename.pdf" startpage = 5 #set starting page in the pdf -1 (i.e. here we want to start from page 6) endpage = -1 #last page srcfile = PdfFileReader(infn, 'rb') merger = PdfFileMerger() page_range = str(startpage) + ':' + str(endpage) merger.append(srcfile, pages=PageRange(page_range)) merger.write(outfn)
And if you are using Ubuntu, remember to install pypdf2 package first
sudo apt install python3-pypdf2