Excel Certificate Scraper User Guide
Extract data from all your downloaded pdf certificates and condense them into an easy-to-process and use excel sheet!
Further tools to add your certificate data to your personal website coming soon!
Why Use this Macros?
While working on skills upgrading through online learning platforms, it becomes quickly apparent that there are few sites that can reasonably display the large library of certificates in a manner that is concise, not overwhelming to the viewer and easy to search through.
As such, extraction of the certificate data and migrating from the typical Linkedin profile built-in certificate display system to a custom certificate data display site is preferable. The idea is to condense the information while also providing links to the individual certificate proofs, so as not to subtract from the authenticity of the claims.
Downloading the Macros
The Macros can be found in the pdfExtractionTest.xlsm
file found here
Click the download button as shown above and save the file to your computer to use the macros.
Using the Macros
Set up
Above is the initial view of the excel sheet. If you see the protected view, you need to enable editing. This allows the macros to add data to your sheets.
After enabling editing, you may be prompted to Enable Content. This allows the macros to run.
This may appear as you are not the author of the Macros. If you are concerned about the security of the file, the full code can be found here. To fix the error you may try to redownload the file, or follow the instructions by clicking the Learn More link.
Additionally, have your certificates sorted into a folder by Linkedin and Coursera. The naming of the folder does not matter, and each folder can also have subfolders as shown in the example above. The only requirements are as follows:
- All files found in the folder and its subfolder are of pdf type.
- The folder for Coursera certificates only contains Coursera certificates.
- The folder for Linkedin certificates only contains Linkedin certificates.
Note that files of the wrong type may break the macros and prevent it from running properly.
Extracting File Paths Coursera
For the Coursera Files, first click the Get FileNames Coursera
button.
Next select the file containing all your Coursera certificates.
This will extract the file paths of all the pdf files in the folder and output it to the filePathsCoursera
sheet.
Finally, to extract the data from the pdf files found, click the Extract PDF data Coursera
button.
To view the output of the macros, you can view the parsedDataCoursera
sheet. You may copy this data to any csv for further data display or processing.
Extracting File Paths Coursera
For the Coursera Files, first click the Get FileNames Linkedin
button.
Next select the file containing all your Linkedin certificates.
This will extract the file paths of all the pdf files in the folder and output it to the filePathsLinkedin
sheet.
Finally, to extract the data from the pdf files found, click the Extract PDF data Linkedin
button.
To view the output of the macros, you can view the parsedDataLinkedin
sheet. You may copy this data to any csv for further data display or processing.
Tips for data use
You end up with a sheet for your coursera data and a sheet for your Linkedin data. You could display this data in a tabular form on your personal website or Linkedin profile for ease of viewing.
Assumptions
Assumptions about the Coursera and Linkedin certificates are made regarding the format when imported as pdf into Excel. This occasionally causes the macros to fail, and some errors may occur.
If any outstanding macros breaking errors occur, or if you have any questions, please raise them here so that I can apply the necessary fixes.
Error handling
When the parsing goes wrong, the macros should work to do error handling and highlight rows with errors
Red highlights
The macros was unable to locate the required data
Yellow highlights
The data located is not as expected (eg the length of the title is too short)
Customization
If you have other certificate collections from other providers that have a standard format, you may request for macros updates in the repo issues here.
Additionally, you may request other certificate parsing macros (eg for datacamp) in the repo issues, or ask for new fields in the certificates to be scraped (eg special notes on Linkedin certificates from secondary institutes).
Contribute
If you would like to contribute to this macros, check out the main repo here to view the code.