Simon Willison @simonw 2022-09-20
If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?
Simon Willison @simonw 2022-09-20
(I’m mainly interested in answers from people who do this kind of thing relatively often and DON’T use SQLite / Datasette to do it, but if you do use those I’m interested in hearing from you too!)
Simon Willison @simonw 2022-09-20
Follow up: same question for 1 million rows, 10 million rows, 1 billion rows
Simon Willison @simonw 2022-09-20
Wow the answers to this are absolutely fantastic, and VERY varied
Simon Willison @simonw 2022-09-20
My own answer: I either open the CSV directly in the Datasette Desktop Mac application (https://datasette.io/desktop) or I do this:
sqlite-utils insert /tmp/data.db rows big.csv —csv
datasette /tmp/data.db
That gives me a table called “rows” in a fresh SQLite database
Kyle Falconer @pereclies 2022-09-20
Datasette looks interesting. Do you know of something like that but for Windows or Linux?
Shaun McDonald @smsm1 2022-09-21
Datasette works on Linux and Windows too from the command line, not sure about the GUI being available. I’ve got a team using it regularly for exploring GTFS files.
Shaun McDonald @smsm1 2022-09-21
(where other more specific tools don’t succeed or trying to find very specific info that the other tools don’t make available easily).