Data Mining: Difference between revisions
Aplstudent (talk | contribs) Data Mining landing page |
Aplstudent (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
Intro | |||
Data Science is a buzz word that means different things to different people. It can be the use of any amount of data and data analysis in a scientific process. It is also a growing field at the intersection of machine learning, statistics, information science and business. | |||
History as I know it | |||
Machine learning is a sub field of artificial intelligence. It focuses on how to make algorithms that predict, classify outputs of a data set. This field utilizes the theory of discrete math, and graphs to provide a rigorous understanding and process for knowledge. Machine Learning itself is part of Data Mining, which is finding trends and patterns in large sets of data. | |||
Applications | |||
With the growth of the amount of data generated by the internet machine learning has undergone a huge explosion as the size of data has become huge. There are applications using the algorithms that we use everyday. Face recognition, voice recognition, word prediction, spam detection, cancer detection, genetics, hand written digit recognition, text analytics. | |||
There are three type of learning. Supervised, unsupervised and reenforcment. Supervised learning tasks are regrssopm amd classification, reinforcement learning is for stuff like the stock market, unsupervised is for other things. | |||
Overview of the process | |||
Get the data | |||
Explore the data | |||
Create model/score model | |||
Tweak things | |||
Collecting Data: | |||
There are many interesting data sets that are available for exploration Uci machine learning repossitory, kaggle, reddit/r/datasets ckan(government documents) government docs, library of congress, website api’s, | |||
What to do when struggling: | What to do when struggling: | ||
Line 31: | Line 28: | ||
users and developers so information on everything is plentiful. | users and developers so information on everything is plentiful. | ||
Revision as of 06:30, 14 April 2016
Intro
Data Science is a buzz word that means different things to different people. It can be the use of any amount of data and data analysis in a scientific process. It is also a growing field at the intersection of machine learning, statistics, information science and business.
History as I know it Machine learning is a sub field of artificial intelligence. It focuses on how to make algorithms that predict, classify outputs of a data set. This field utilizes the theory of discrete math, and graphs to provide a rigorous understanding and process for knowledge. Machine Learning itself is part of Data Mining, which is finding trends and patterns in large sets of data.
Applications
With the growth of the amount of data generated by the internet machine learning has undergone a huge explosion as the size of data has become huge. There are applications using the algorithms that we use everyday. Face recognition, voice recognition, word prediction, spam detection, cancer detection, genetics, hand written digit recognition, text analytics.
There are three type of learning. Supervised, unsupervised and reenforcment. Supervised learning tasks are regrssopm amd classification, reinforcement learning is for stuff like the stock market, unsupervised is for other things.
Overview of the process
Get the data Explore the data Create model/score model Tweak things
Collecting Data: There are many interesting data sets that are available for exploration Uci machine learning repossitory, kaggle, reddit/r/datasets ckan(government documents) government docs, library of congress, website api’s,
What to do when struggling:
R and python both have complete documentation so you can find information about any function in it. ALso if you press "tab" after a dot ipython will show you the options you can write next. In Rstudio this is done automatically. There is also a very active community of users and developers so information on everything is plentiful.
Python Tutorials and blogs. If you've never touched python start by doing the first 20 exercies of this: [LPTHW]
Then follow the scipy primer on this wiki
to practice python skills [Subreddit]
To learn machine learning: This progression of tutorials take you from a little knowledge of numpy to a modern neural net with all the bells and whistles. It'll make you feel amazing about what you can do with python. first tutorial : [NeuralNet] All: [[1]]
intro to working with datasets: https://www.kaggle.com/c/titanic