Simon Willison @simonw 2022-09-20

If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?


Simon Willison @simonw 2022-09-20

(I’m mainly interested in answers from people who do this kind of thing relatively often and DON’T use SQLite / Datasette to do it, but if you do use those I’m interested in hearing from you too!)


Simon Willison @simonw 2022-09-20

Follow up: same question for 1 million rows, 10 million rows, 1 billion rows


Simon Willison @simonw 2022-09-20

Wow the answers to this are absolutely fantastic, and VERY varied


Simon Willison @simonw 2022-09-20

My own answer: I either open the CSV directly in the Datasette Desktop Mac application (https://datasette.io/desktop) or I do this:

sqlite-utils insert /tmp/data.db rows big.csv —csv

datasette /tmp/data.db

That gives me a table called “rows” in a fresh SQLite database


Kyle Falconer @pereclies 2022-09-20

Datasette looks interesting. Do you know of something like that but for Windows or Linux?


Shaun McDonald @smsm1 2022-09-21

Datasette works on Linux and Windows too from the command line, not sure about the GUI being available. I’ve got a team using it regularly for exploring GTFS files.


Shaun McDonald @smsm1 2022-09-21

(where other more specific tools don’t succeed or trying to find very specific info that the other tools don’t make available easily).