Data Mining

From APL_wiki
Revision as of 01:59, 20 March 2016 by Aplstudent (talk | contribs) (Data Mining landing page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Every field is generating huge amounts of data. Data mining has been applied to the fields of Artificial Intelligence, Humanities, Biology, Physics, Marketing, Operations. The techniques of statistics, machine learning, programming, problem solving, information science, communication and visualization have combined to form the field of Data Science.

The goal of this project is to explore the data mining research field and acquire skills using python and R The tutorials I think are helpful will be linked here. There will be a series of tutorials on a variety of useful packages linked to github.

Current Research project: Clustering and Data Visualization in the Chronicling America Archives. One can use the api (follow my code) Or get a builk download of files I am working on getting access to the entire collection

teaching modules to be added: using APL resources to use ipython 2 notebook Rstudio on windows -packages setting up iep and Anaconda on windows -packages and pip python and R loading various formats data web scraping( html/xml/css) the power of pandas starting with sklearn introduction to theano and progression of models to make


What to do when struggling:

R and python both have complete documentation so you can find information about any function in it. ALso if you press "tab" after a dot ipython will show you the options you can write next. In Rstudio this is done automatically. There is also a very active community of users and developers so information on everything is plentiful.


Programming is a skill that becomes easier to learn the longer you do it. The process I follow to build understanding of the problem is as follows start researching! find an example of the function being used correctly by googling : "(name of package/function) and 'example')" read a tutorial on how to use it: google: "(name of package/function) and 'tutorial')" read the documentation for the package: google ("name of package/function" and "documentation") use python help() and R ? to read documentation in environment. start programming! simplify your problem to make the smallest possible piece of code work repeat research step if new problems come up. build up the problem and keep checking code until finished.


Python Tutorials and blogs. If you've never touched python start by doing the first 20 exercies of this: [LPTHW]

Then follow the scipy primer on this wiki

to practice python skills [Subreddit]

To learn machine learning: This progression of tutorials take you from a little knowledge of numpy to a modern neural net with all the bells and whistles. It'll make you feel amazing about what you can do with python. first tutorial : [NeuralNet] All: [[1]]

intro to working with datasets: https://www.kaggle.com/c/titanic