Data Mining

Apr 2016

Professor Rabinowitz is a very nice professor. However, in class, he is extremely unorganized and he is not the best at explaining concepts. There were many times when he would try to explain a concept again only for the MA students (and handful of undergrads) to stare blankly back at him. Furthermore, his homeworks were very confusing. He would constantly have to update the problems because they were not clear. They were also quite challenging and required going to office hours to understand how to solve the problems (even the TAs did not understand sometimes). Despite these failings, though, Dan really does make an effort. He really cares about the students and holds lots of office hours to help students. During reading week, he held at least three final review sessions, each of which were a few hours long. I think that he doesn't usually teach Data Mining, hence his inexperience. However, if you seek help, Dan will go the extra mile to help you out. It's not likely that Dan will teach Data Mining any time soon again, but in case he does, I suggest taking advantage of office hours, since Dan is very helpful then. Also, there is a huge overlap with Statistical Machine Learning in terms of content.

May 2012

Frank Wood is idiosyncratic to say the least. Check out his homepage and you'll see a portrait of him behind which is a background that makes him look enshrined in light. As previous reviewers have said, Wood spends most classes copying his notes, which are notably nearly _identical_ to those of the text, onto the blackboard. He also derides applications of data mining (he's not training you to become a mining 'technician') though he sometimes admits he doesn't understand the (difficult) math that makes this stuff possible. On the first day of class, he seemed to brag about how much of the class was going to drop his course, as if that constituted some signal as to the quality of the the course. The bottom line here is that you need _serious_ stat background to get all you can out of the course. Given that the math is difficult, getting a sense of the intuition the motivates results is useful and perhaps even necessary, but he doesn't offer this. Developing intuition is paramount in abstract math courses; data mining is no different and Wood ought to see this. Students are then left to develop intuition on their own. But this is difficult because the examples he gives in class are usually the same examples from the text. Indeed, Data Mining, offered to both undergrads and masters (even phd students) is likely a tough course to teach: What can or should Wood assume that you know or don't know? It's tough to say, but it doesn't seem like he's found the right balance.

May 2012

This class was phenomenal. I was on the fence about taking it initially and I am so pleased that I decided to see it out. This class snowballs from this kind of basic, probabilistic manipulation into a new fundamental outlook on the use and future of modern statistics. The homeworks are interesting and help understand the transformation from formula to code at a perfect pace. The lectures are downright awesome. Frank actually cares about what he's teaching, and he actively engages the class. He is also a very to-the-point guy, which helps to avoid any small misunderstandings that could easily accumulate due to the nature of the material. Best class and best teacher I've had while I've been here, hands down.

Mar 2012

Class was very poorly taught, the "lectures" were pretty much Frank copying from his notes onto the chalkboard. Sometimes he would spend up to a minute copying down long derivations from his notes. Vague questions are also his thing - expect many "what would this mean" thrown at the class while everyone is struggling to figure out what aspect he is talking about. One of the TAs is essentially unreachable (the male one), especially with all the office hour cancellations and his policy of not responding to emails. The other one is more responsive to emails and questions. Learn to love the textbook.

Jan 2012

Chris Volinsky is what you would want out of a professor for a course such as Data Mining. He takes an extremely broad subject matter and distills its component parts without sacrificing technical rigour for the most part. His notes are extremely thorough and credible given his industry background (won Netflix prize couple years back). What this course really excels in though is giving you the toolbox to go off on your data mining tangent when you see fit and create something out of nothing. Just be sure you know some scripting language coming in to operate on the data sets, which get increasingly massive as the semester goes on. These could be any one of R, Matlab, Python, etc it really doesnt matter as the course is not taught via implementation.

Dec 2011

I loved this class and this professor. He brings a fascinating perspective, combining industry experience (he works for AT&T) and academic training which gives him a lot of cool stories to tell as well as knowledge about current trends in the wider data mining community. He also won the Netflix Prize (google it if you don't know what it is) and he spends a lot of time telling us about that, which was very cool. He's also a big sports fan so get ready for lots of sports examples (though he won't test you on them). He's generally a big data/stats nerd and often brings in cool examples of research of visualizations. He tends to go for breadth rather than depth and generally doesn't get too technical. I liked this approach -- I'd rather get exposed to more subjects and dive into them more on my own if I'm interested -- but I can understand if someone disagrees. He goes through a general introduction to data mining and then discusses: data visualization, cross-validation techniques, regression, classification, clustering, text mining, web mining, neural nets and support vector machines, ensemble methods, bayesian methods, recommender systems, and social networks. I would imagine that some of the later topics could be changed if he teaches the class again. In terms of a programming background, it's really very important to know R (or be willing to spend time learning it). You can also get by with SAS or Stata (or a similar statistical software) but he often gives tips on how to do things in R specifically, so you will be at a disadvantage. Knowledge of Java/Perl/Python is pretty irrelevant, though I guess it could be useful for your project. The difficulty of the homework assignments heavily depends on your ability in R (or SAS/Stata). If you're proficient, each one will take no more than a few hours. If you're learning R on the fly, they could be time-consuming. The term project takes a lot of time. He wants you to try many different data mining techniques and, depending on your data set, cleaning/preparing the data could involve hours and hours of work. On top of that, he expects a 10-20 page report including numerous visualizations. I found Chris to very approachable, although it is a big class and you will have to make an effort to speak to him if you so choose. He's an adjunct so he sometimes shows a lack of polish in his presentations, but he's very passionate about the field and I think that comes through. He's also very active in getting feedback about how to improve: I imagine he will be even better if he teaches again. About half the students are masters students in stat, with the other half being spread around various other masters programs (including finance, journalism, social science, econ). So don't be intimidated if your stat background isn't the strongest, but you definitely need to have taken several stat classes already (I'd say at least through 3107 and 4315 would be very helpful as well).

Dec 2011

This class will kick your ass and make you a better person. ... culpa won't let me submit unless I write more ... not sure what else to say about this class... here's some advice: get to know matlab early, read bishop extensively, talk to prof wood and TAs, try out the math underlying each assignment best you can, understand the few key prob distributions used extensively in this class and statistical machine learning, really understand Bayes.

Jun 2011

In short, he's a really good professor. If you are interested in machine learning, I feel that this class does the best job of teaching it at columbia. Put effort into this class and you will learn an amazing amount. I feel like many of us were very personally invested in this class by the end of the semester (I was for sure). In lectures, he was somehow able to make the whole class so intensely focused and interested in what we were doing---more than any class I've taken at columbia. Everyone became extremely engaged with what we were studying.

May 2011

Frank makes a class as mathematically technical as Data Mining feel like a seminar or a conversation. This is what distinguishes himself as a professor and his class as a jewel not just in the Stats department, but in Columbia's scientific community. I was impressed with how familiar Frank was with my final project, given how many class projects he had to work with. I think this really shows how much Frank cares about his students.

May 2010

Stay away!!! This professor is a terrible teacher, does not like to help to his students even during his office hours (he made me feel as if I am wasting his time and did not answered some of my questions), and is not consistent with his grading. His lectures were waste of time. He used powerpoint slides that have lots of graphs, but he did not explain what the underlying data are or what the axes on a plot are. Also, he spent last half of the semester teaching how to code in Phyton. If I want to learn Phyton, I would have taken a class from the computer science department. Half of the students stopped coming to class after the midterm, because they have realized that the classes are no use and the questions in the exam had nothing to do with what we covered in class. I'll recommend everyone to run away from this professor. It's a pity too; I was really looking forward to taking a good data mining course.