Clustering With K-Means in Python

The Data Science Lab

A very common task in data analysis is that of grouping a set of objects into subsets such that all elements within a group are more similar among them than they are to the others. The practical applications of such a procedure are many: given a medical image of a group of cells, a clustering algorithm could aid in identifying the centers of the cells; looking at the GPS data of a user’s mobile device, their more frequently visited locations within a certain radius can be revealed; for any set of unlabeled observations, clustering helps establish the existence of some sort of structure that might indicate that the data is separable.

Mathematical background

The k-means algorithm takes a dataset X of N points as input, together with a parameter K specifying how many clusters to create. The output is a set of K cluster centroids and a labeling…

View original post 705 more words

Supervised and Unsupervised learning


Home | About | Data scientists Interviews | For beginners | Join us

Supervised vs Unsupervised Learning

Wiki Supervised Learning Definition 

        Supervised learning is the Data mining task of inferring a function from labeled training data.The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a “reasonable” way.

Wiki Unsupervised Learning Definition

In Data mining, the problem of unsupervised learning  is that of trying to find hidden structure…

View original post 579 more words

Find job through twitter!

Hello Folks!

So, here it goes.. First post! Always wanted to create one blog, even had one in past! But as usual, NO TIME!!! Got some time after a loooooooooooooooong time 😛

Got one !dea.. Finding job through tweets! Yes, some companies do post tweet about job opening. Anyway, it will be fun and informative to play with twitter data.

First of all make sure you got following checklist completed!

That’s it! Nothing else needed! Just open IPython Notebook by launching “Launcher” after installation finished! Create new IPython Notebook, it looks like this..

Screen Shot 2016-02-02 at 6.41.40 PM

Anaconda launcher, open IPython-notebook


Screen Shot 2016-02-02 at 6.43.51 PM

new ipython-notebook

Alrighty.. here comes the fun part! Now python stuff starts.. make sure you complete following checklist:

  • Create an account on Twitter
  • Generate authentication tokens by following the instructions here.
  • Add your tokens to the twitter.cfg file. (API Key == Consumer Key)
  • Assuming you’ve already installed pip, you can add TwitterAPI with following command: Read more about TwitterAPI!

pip install TwitterAPI

that’s it! Let’s start some codding now..

1) Import necessary files

Screen Shot 2016-02-02 at 7.01.30 PM

2) Create Twitter API instance. Add your credentials in ‘twitter.cfg’ file & put it in same folder where your current IPython notebook is placed! here is what you have to add inside that ‘twitter.cfg’ file!

consumer_key: consumer_key
consumer_secret: consumer_secret
access_token: token_number
access_token_secret: token_secret

Screen Shot 2016-02-02 at 7.00.52 PM

3) Create request to twitter

Screen Shot 2016-02-02 at 7.02.34 PM

4) Create input to enter search query:

Screen Shot 2016-02-02 at 7.07.22 PM

5) Create input to enter numbers of tweets we want to fetch from twitter:

Screen Shot 2016-02-02 at 7.09.34 PM.png

6) check the fetched tweets:

Screen Shot 2016-02-02 at 7.11.23 PM.png

7) Print tweets:

Screen Shot 2016-02-02 at 7.12.24 PM.png


~ check the whole code on my Github: Code

~ follow on twitter: @iSnehal_SP