Thinking about data science and modeling

The Statistics Package for the Social Scientist (SPSS) has always made sense to me. For years workplace tools were less complex than SPSS. The new generation of data science focused applications are highly complex. Those applications facilitate slicing and dicing data into epic infographics. That class of applications does allow end users to use highly complex models. Some of those models are not well understood. Access to highly advanced statistical modeling tools does not guarantee understanding. Data science and dating modeling fell flat during this last election.

Executive level reporting has always involved both art and science. Modeling the latest presidential election involved did not go very well for pollsters and pundits. I have done a ton of data modeling over the years. Anybody can pick up a copy of Armstrong’s Principles of Forecasting (2001). Diving in is really the best way to get started. Data science has turned into a first in the pool type of profession. People strive toward predicting the future. Most models that are aimed at predicting the future are not tested against the past. Modeling public sentiment is a challenge. Sometimes it is fleeting. Sometimes permanence exists.

Studying Hadoop

Larger data set analysis requires different solutions than smaller data set analysis. That might seem like a simple argument to make, but it is very important to understand where the tipping point occurs. A long time ago in a faraway basement, I learned that my dataset had expanded beyond the confines of a single computer. I needed something better. I needed a solution that was scalable and inexpensive. That is when I started studying the Apache project Hadoop.

The practitioner data scientist

What exactly do data scientists do in the workplace? I’m working on a larger post related to practitioners of data science. Some companies have had data scientists or employees that would with advanced applied statistics for years. Individuals that work with big data to help provide business intelligence services do exist.

A long time ago, in a start-up rich land… I first started working with databases during the second half of 1998. For those of you following the timeline at home, that is about 15 years ago. For more than a decade, I have had the opportunity to work with automated online data collection techniques.

My custom internet spider has been collection data for years. Last year I was able to complete an update to the spider. The code was rewritten during some of my downtime. I routinely publish papers related to social media, data mining, automated data collection, and various methods of computational social network analysis.