How do I get data from city websites when the data are stuck in PDFs?
How do I get data from city websites when the data are stuck in PDFs? Can someone help show us how to scrape data?
-
Valerie T. commented
Since they used data to create the PDFs, they should be willing and able to provide that data in a more usable form. What are the barriers to that approach?
-
Eric Fischer commented
Sure, yes, I can do this demo of basic web site and PDF scraping.
-
Marisa Raya commented
Specific example: be able to comment online on a draft specific plan with an app that matches the concerns you type - housing, gentrification, jobs - with the relevant text in the plan, so that you immediately see how the issue is treated and can suggest direct language improvements.
-
John Osborn commented
This is an extremely important issue for me. Oakland is making strides when it comes to campaign finance data, but PDFs are really harming innovation in handling important documents like contracts, agenda items, voting records, etc.
-
Scott Law commented
If the transaction engines and reporting databases used by the City can produce PDF, they can also produce ASCII texts and Comma delimited files. For a small charge, Adobe provides a facility to convert PDF to Excell or Word, but de-formatting the files is a hassle.... ASCII and comma delimited are better choices. My bet is this is a matter of training and process implementations and not a technical problem. This actually a common corporate problem also, not just cities...
-
Theresa O'Connor commented
The City of Oakland's predilection for PDFs makes me crazy. We need a commitment from the city for them to stop using them or it's just fighting a losing battle trying to scrape that stuff out.