Saturday, February 21, 2009

Portfolio #4: Clusters of Data Goodness

Part 1: Plug 'n Chug
Not going to say much here, because it was pretty much just plug 'n chug (hence the title).

Part 2
I decided to do my clustering on the number of college students that are able to obtain a bachelor's degree within 3 years versus 6 years by state (out of 100). I downloaded the data in two parts (3 year grad statistics and the 6 year grad statistics) and then I made a new table with both of the statistics together. I think copied and pasted it into notepad. Regardless, it didn't work for my data--I assume there is a way I'm supposed to format my txt data file that I don't know (the txt data files that I got for the book were in a format I didn't understand). Here's some screen shots of stuff not working:

<----- This is annoying.










In short: Part 2 was a massive failure. Perhaps I'll wrestle with it more later.

Part 3: I am Made of Fail
I am apparently made of failure. I put a data set on Many Eyes, but I couldn't get Many Eyes to visualize it. I'm guessing this data set just wasn't clusterable

Thursday, February 12, 2009

Portfolio 3: "Head Bang" from Hell

Our group (THE FEAR) got together (mostly) last Saturday to work on this. We spent the next couple of hours trying to get the Python code to work properly.

At first, we had just dumped pylast.py into the python library folder and tried to use the functions in pylast. Needless to say, this ended in failure.

Then, we tried looking at what was on the class site. Still didn't work.

After smacking our heads against a wall (figuratively), we found out that we needed to use the keys provided on the website for a user. Luke had a username that he made for this portfolio assignment, so we used his. Shortly thereafter, we got it to work in python by using the keys on last fm's website as parameters into the various methods, and then we concluded the group meeting (so no GUI work was done during the group meeting).

Later in the week, Luke remade it into C# with a GUI and a bunch of fancy options, as everyone saw in class. If you want to know more about what he did, check out his blog.

Sunday, February 1, 2009

Portfolio #2: I run into brick walls quickly

I downloaded the python code using the link Dr. Zacharski gave us. I then was quickly stopped from going any further because "import feedparser" wasn't working--I kept on getting an error stating that "ImportError: No module named feedparser". I tried to just comment out the import feedparser line in pydelicious.py, but a lot of thigns in pydelicious.py need the feedparser, so that proved to be a useless effort. Instead of banging my head against this wall, I'm just going to continue onto the weka stuff.

Part 1: Playing with Weka
Today seems to be a good day to run into brick walls; I spent a good 10 minutes trying to figure out where the examples that came with Weka were on my computer. I had installed Weka on Ubuntu Linux using the synaptic package manager; when I ran Weka and tried to load the example, the folder it defaulted to was not the Weka folder, but the folder my command prompt happened to be in (not surprising, really). Regardless, because I let SPM install Weka for me, I had no clue where Weka was installed to or where it put it's files, so I had to dig around for a while. In the end, however, there was success:
(This is a screenshot of my success!)












Part 2: Playing More with Weka
So I started I did the same thing as part 1 to the Cleveland Heart Disease data:












The stats are as follows: ~78% correctly classified and ~22% incorrectly classified. This definitely shows that filtering the data worked well for this set--the majority of the data was filtered correctly. However, a good portion of the data was still incorrectly filtered, leaving me to conclude that filtering (or, at least, j48 filtering) would not be a suitable for this data set or to data pretaining to this content.