It has been a hell of a week, from three late/all-nighters in a row working on everything from malware command-and-control servers to big data projects and right wing extremism research. Juggling a million hot pans over the fire left fairly little time to work on my Jupyter Notebook to track my social stats and income streams, but today after my meetings I had some time to throw some dead simple data visualization in. This will be a short blog, but it's all about building the cake one layer at a time...
Did you miss part one? Check it out here!
I like to post the code toward the top of the blog for people who just want to read the code and follow along... or the inevitable copy-pasters. I see ya'll and I understand.
Pandas and Matplotlib: A match made in a stats-moron's dream
I am... horrible at math. Specifically, I'm very bad at statistics. I failed it 1.5 times, only pulling off a C because I begged my professor to go easy on me my senior year of college.I love to code because I can be bad at math and just tell Python to do the hard stuff for me. I'm actually making it one of my goals this year and next year and... probably for the next decade to become at least a bit more proficient with calculus, algebra and statistics for a bunch of projects i want to work on, so I'll probably be blogging about that journey as well.
Pandas and Matplotlib make the very basics, as well as a lot of more complex subjects, way easier.
Pandas is an open source library that makes working with data lightning fast. I'm not even going to try to cover all of the things you can do with Pandas, mainly because I'm probably going to do a deep dive blog on it later, but basically it makes it stupid easy to work with everything from labeled CSV's to complex JSON through a super simple interface. My use case here is basically ingesting my CSV file full of my social stats and revenue so I can more easily graph it out.
Matplotlib is my go-to for basic graphical visualizations. I'm not going to go into the true depth of the Matplotlib library, partially for the same reason as Pandas and partially because I'm a math moron who wouldn't understand a lot of it anyways, but my use case here is to plot out my social stats on a line graph to see growth over time.
So I've been inputting my social stats and revenue on a daily basis using the code I wrote for the last blog. It's all in a very simple CSV file that I add a line to every morning with my first cup of coffee. I'm going to use the vis() function to ingest the CSV file using the pandas.read_csv() function.
Next, I create a fig object using the pyplot package that's part of the matplotlib library. This fig object is our overall graph, and the lines we add to it will be added as subplots.
That last line, ax.xaxis.set_major_locator(pyplot.MaxNLocator(7)), is how you can limit the output on the X Axis labels to only 7 of them, basically to just keep the X Axis from looking too cluttered.I also passed a tuple in to the initialization of our fig object to make the graph a bit bigger. The subplot I added was just the date on the X Axis and my follower count on the Y Axis, pulled from our pandas object.
The final result is a very basic looking graph that shows my growth on Twitter over time! I didn't add axis labels or titles or anything like that since this graph is really just for me, but it would be fairly easy to do so with a couple lines of code.
Over the next week, I'm going to start writing some API code to pull stats automatically so I don't have to go through every morning and update the CSV file manually. Ideally I want this to just be a dashboard I pull up every morning to check out, see if there are any major changes, get insights into what's working and get a feel for how well my side-gig revenue is growing. I'll be adding more visualizations over time as well, but that's pretty easy as long as the data is simple, so as long as it's just simple line graphs like this I probably won't include that much in the blog.
I've got some other big plans for this project, like functionality to automatically tweet progress pictures at the end of the week and things like that, but I'll leave those under wraps until I've got the ideas more fleshed out.