Blog

Instagram Data Scraper | The Ultimate Guide [2020]

By October 8, 2020 No Comments

This guide covers how to scrape Instagram data, be it emails, phone numbers, bio, hashtags, or images. The same principles apply to collect data from your own followers or the followers of any Instagram influencer (such as your competitors).

A short summary of what we’ll talk about:

  1. How to Build an Instagram Scraper Tool
  2. The Pros and Cons of Building Instagram Data Scraper
  3. Buying Scraped Instagram Data
  4. Scrape Comments From Instagram
  5. Instagram Image Scraper
  6. Export Instagram Likes
  7. Scrape Instagram Hashtag

How to Build an Instagram Scraper Tool

1. Use the Unofficial Instagram API To Access the Data

The official Instagram API that got disabled on June 29, 2020, gave the option to scrape your personal data such as comments or posts on Instagram. Yes, it was useless in situations where you’d wanted to get (publicly available) quality data such as email, phone numbers, bio, etc. out of other people’s accounts. 

However, Instagram is primarily a mobile app. This means that they must use an unofficial API (known as mobile endpoints) to communicate to and from their servers. So, with the help of open-source software and intercepting traffic we can see how their API works and use it for data scraping. 

You can read more about this in my Instagram email finder article.

2. Python / PHP Script for Automation

To access the unofficial Instagram API we use mobile endpoints through Python, PHP, or really anything that can log in to Instagram accounts and scrape the data. It’s only a few lines of Python code (see example below) but there is so much more to it. Here is how you can log in a profile to use the unofficial Instagram API.

All code samples are taken from Github public repos, like the one listed here.

def login(self, force=False):
        """
            Authenticate this API instance.
            If already logged in (and not later logged out) does nothing (unless forced).
        :param force: if true, will attempt to log in even if already logged in.
        :return: dictionary of responses.
        """
        if not self._isloggedin or force:
            self._session = requests.Session()
            # if you need proxy make somethin://proxyip:proxyport"}
            full_response = self._sendrequest(
                'si/fetch_headers/?challenge_type=signup&guid=' + self.generate_uuid(False), login=True)
            data = {
                'phone_id': self.generate_uuid(True),
                '_csrftoken': full_response.cookies['csrftoken'],
                'username': self._username,
                'guid': self._uuid,
                'device_id': self._deviceid,
                'password': self._password,
                'login_attempt_count': '0'}
            try:
                full_response = self._sendrequest(
                    'accounts/login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            except InstagramAPIBase._2FA_Required as exception:
                # In order to login, need to provide the second factor (i.e. SMS code or backup code).
                # Use call-back to get this string.
                if not self._two_factor_callback:
                    raise AuthenticationError("This account requires support for Two-Factor Authentication")
                two_factor_info = exception.two_factor_info = exception.two_factor_info
                verification_string = self._two_factor_callback(two_factor_info)
                data = {
                    'verification_code': verification_string,
                    'two_factor_identifier': g like this:
            # self.s.proxies = {"https": "httptwo_factor_info['two_factor_identifier'],
                    '_csrftoken': full_response.cookies['csrftoken'],
                    'username': self._username,
                    'device_id': self._deviceid,
                    'password': self._password,
                }
                full_response = self._sendrequest(
                    'accounts/two_factor_login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            self._isloggedin = True
            decoded_text = json.loads(full_response.text)
            self._loggedinuserid = decoded_text["logged_in_user"]["pk"]
            self._ranktoken = "%s_%s" % (self._loggedinuserid, self._uuid)
            self._csrftoken = full_response.cookies["csrftoken"]
            return decoded_text

Now, there is a lot more going on in the background than a simple login request, that part is easy. You have to use a proxy (preferably from the location you are already at, make sure to complete any captchas if necessary, and then let the profile rest for a few days before you begin scraping because Instagram is now “watching you”.

The login request is probably the most problematic one because it’s triggering Instagram’s algorithm to keep an eye out for you, so please be careful.

Just a note that if you don’t want to bother with any of this and only get emails already scraped by a company that does this at scale, fill in the form below and we’ll get you that data asap.

3. Instagram Profiles for Scraping the Data

Maybe you are only starting out with email / sms marketing and don’t need a huge amount of data. Good for you because the safest way to scrape data is to do it by hand. Just visit profiles and copy-paste their info. It is safe because Instagram cannot detect any violations since it’s something that everyone does, peeking into people’s profiles.

And this is exactly what we do but we do it at scale and it’s all automated.

We currently own over 1 500 Instagram accounts that have only one goal. To simulate human behavior on Instagram’s mobile app, while extracting every available data point of the profile. The reason we do this is because Instagram has a VERY limited API call limit. Meaning that the scraping capacity of a single profile is really low and If we don’t run with 1 500 accounts, the crawling takes forever.

Now if you want to buy Instagram profiles and scrape yourself, there are two important things to note:

  1. ALWAYS use aged Instagram profiles and have phone numbers that you can validate them with otherwise they’ll be banned almost immediately
  2. NEVER use your personal profile

You can purchase Instagram profiles from;

  • Facebook pages 
  • Instagram direct messages
  • and even on dedicated online marketplaces across the world

But even if you manage to buy and login with all those profiles, you’ll still face many challenges. Instagram is pretty smart nowadays and can recognize profiles that are bought from the so-called gray market. However, few sellers are really good at creating these fakes that are hard to detect. To find one of these pros just search for the most expensive sellers on the market.

4. Proxies for Remaining Undetected

A proxy is a third party server that allows you to route your request through their servers and use their IP address in the process. When using a proxy, Instagram no longer sees your IP address but the IP address of the proxy, giving you the ability to scrape from one server. But remember no to simulate too many IPs because logging in more than 5 profiles on the same IP is a huge no-no..

Just as with the Instagram profiles, the same problem occurs with proxies. Instagram detects thousands of proxy providers and until you find a good one you’re in a lot of trouble. Because if Instagram bans the proxy you use, that automatically means that the associated Instagram profile is also no longer available. To check if you are safe and your proxy provider is still not on the radar, then use this website to paste your proxie’s IP. If it’s a known provider it will be there and since this website knows, trust me the all-seeing Zuckerberg eye knows too. 

The Pros and Cons of Building Instagram Data Scraper

Pros:

  • Full control of the whole process 
  • The contact data you acquire can be resold or rented
  • You can use the data to scale your business

Cons:

  • No targeting or segmentation
  • Fake Accounts and Bots
  • Invalid Emails, Spam Traps, Catch-all
  • Security Risks
  • Very limited data points

No targeting or segmentation

There’s this phrase “Data is the new oil”. Meaning that data is worth only after it gets refined (segmented, analyzed) and is ready to use – same as oil.  

Most businesses target specific audiences and try to avoid people that don’t belong in these groups. Even if you scrape the followers of your main competitor (which is as targeted as you can get), there are still going to be a lot of people that don’t fit in your target market. 

 This may be due to location, gender, age, etc.

Second, if the data is scraped for reaching out to people through email or mobile, you need to be able to segment people to personalize the message. And again, the most basic filters for data segmentation would be location, gender, age.

Fake Accounts & Bots

The fame of Instagram influencers gave rise to the fake followers’ industry. How? 

Influencers are paid by the number of followers and their engagement stats; meaning that if these numbers go up, so does their paycheck. That’s why influencers of any size, be it mega, macro, micro or nano are all buying fake followers that will artificially boost their popularity.

One study found that one in ten Instagram accounts is a fake, rounding up to more than 95 million fake accounts currently active. 

The problem in our case is that fakes are getting so good that it’s not that easy to distinguish fake from a real personal profile. And in recent years they started including fake emails and phone numbers in the bio or on the profile page.

This can ruin your scraping strategy and email campaign since Google will notice that you’ve contacted too many nonexistent emails and will put all your future emails in spam.   

Fun fact: 65% of all “semi-famous” people from LA have, at some point, bought fake followers.

Invalid Email, Spam Traps, Catch-All 

Pretty much the same problem as with fake accounts. If you cannot detect invalid emails, spam traps, and catch-alls you will end up in spam. 

This is why you must use email validation software like NeverBounce. However, make sure to check out their pricing since it can get really expensive if you do this at scale. 

Security Risks 

The number one weapon Instagram uses to detect bot activity is its API call limit. If you go over their limit you instantly get blocked even if your account is aged. 

Very Limited Data Points

I’ve previously stated that you need gender, age, and location as the basis for any segmentation. These data points cannot be scraped directly from Instagram. However, there are workarounds to get to a certain level of segmentation. 

For example, if you want to scrape people based only on the location you can search for local hashtags such as #New York, #Chicago, #San Francisco. 

For getting the gender of a profile we use advanced AI. The AI checks every image of a profile, detects the face that is most frequent in the images, and analyzes it until it spits out the gender. Currently, we’re at 8 out of 10 success rate which is not the case with age. 

For an AI, age is much more difficult to determine since people do all sorts of stuff (like hair dyeing) to hide their true age. So far we can successfully determine age on 4 out of 10 Instagram profiles. 

Poster a of woman being sad because of AI surveillance

Buying Scraped Instagram Data

I would say that developing an Instagram scraper tool is a luxury that only well-established businesses can afford. For smaller companies, it’s just not feasible to go over this entire process and scrape at a level that won’t bring the much-needed ROI. 

An easier solution is to just buy scraped data from Instagram (like emails and phone numbers) and immediately use it for marketing purposes.

Email marketing is the most profitable channel out there. With our help, 200+ businesses have entirely cut out their ad spend on Google and Facebook Ads and relied solely on the email lists we provide

Making email marketing the only channel for acquiring new customers is something that was not possible before. But Instagram is a platform where people share a lot of information about their life. Our secret sauce is that we’ve learned not only to scrape people’s data but to also understand them. This allowed us to offer businesses contact data of people that are very similar to their ideal customers. And with a database of over 75 million people, we can do that at an unprecedented scale. 


If you’re interested in getting email lists of people that are from your niche, book a free strategy call with us here.

 

Get Data From Instagram

Email Addresses, phone numbers, bio & anything else you might find on Instagram

Instagram Email Scraper – All code samples from one Github Repo

Use this Github Repo for the code samples!

**Note: All of these code samples are working while we were writing the blog post. We do not guarantee that they will work forever because as we said earlier in the blog post, they always make changes to their code and you need a dedicated developer if you want to execute this correctly. 

Once you are logged into with an Instagram account and from a specific proxy, getting data should be “easy enough”. You only need the API endpoints.

 

The one for email is: /api/v1/users/{{user_id}}/info/. 

User.public_emailEmail address
user.usernameThe Username
user.is_privateIf this is a private account
user.full_nameUser’s full name
user.profile_pic_urlUser’s profile photo URL
user.biographyUser’s bio
user.external_urlUser’s website
user.follower_countFollower count
user.following_countFollowing count
user.media_countNumber of posts

These are all the data points you’ll be getting with that API call.
Check out how you can boost the ROI of your email marketing by 700% in this Instagram Email Scraper article.

 

Instagram Phone Number Extractor

The Phone Number “Extractor” comes believe it or not with the same API call as the email. This is if they have it publicly available.

/api/v1/users/{{user_id}}/info/

So other than the data points mentioned above, you’ll also have the phone number.

You should know that only 10-15% of all Instagram users share their personal phone numbers publicly. Although the percentage is relatively small, let’s not forget that 15 percent of all Instagram users are roughly 150 million.

Scrape Comments From Instagram

One thing I really hate about scraping comments from Instagram is that you get a ton of people that do automated comments and/or engagement groups. So make sure to watch out for those. Anyhow, I feel generous so let me tell you how to scrape comments with the:

  1. Mobile API (Like the Instagram phone number extractor and email scraper codes above)
  2. Through the Web API (limited but super fast and easy to do)

For 1) Use /api/v1/media/{{post_id}}/comments/. You’ll get all of them assuming that you have the post ID. It’s easy to find that if you’ve already scraped the posts or just open the photo via the browser and copy the id which is here:

As for 2) take a look at the code sample below. That should get you from 0 to scraping comments (unless Instagram makes some changes to their code and they always do!)

try:
    close_button = driver.find_element_by_class_name('xqRnw')
    close_button.click()
except:
    pass
try:
    load_more_comment = driver.find_element_by_css_selector('.MGdpg > button:nth-child(1)')
    print("Found {}".format(str(load_more_comment)))
    i = 0
    while load_more_comment.is_displayed() and i < int(sys.argv[2]):
        load_more_comment.click()
        time.sleep(1.5)
        load_more_comment = driver.find_element_by_css_selector('.MGdpg > button:nth-child(1)')
        print("Found {}".format(str(load_more_comment)))
        i += 1
except Exception as e:
    print(e)
    pass
user_names = []
user_comments = []
comment = driver.find_elements_by_class_name('gElp9 ')
for c in comment:
    container = c.find_element_by_class_name('C4VMK')
    name = container.find_element_by_class_name('_6lAjh').text
    content = container.find_element_by_tag_name('span').text
    content = content.replace('\n', ' ').strip().rstrip()
    user_names.append(name)
    user_comments.append(content)
user_names.pop(0)
user_comments.pop(0)
import excel_exporter
excel_exporter.export(user_names, user_comments)
driver.close()

Instagram Image Scraper

I need to mention here that scraping photos from Instagram is a bit different than all of the above since you need to get them from the web. Here’s the exact git we use for it:

https://git.nearshoremx.com/rcolin/demo-tv-azteca/blob/a8a46a9f6bb46f711d885ff1c8d6a2d9f5c60b45/env/lib/python3.6/site-packages/instagram_scraper/app.py

Export Instagram Likes

Although it is not possible to simply export likes from Instagram, we can crawl and provide them in CSV files. Here’s the exact code we use:

 

def get_likes_list(username):
    api.login()
    api.searchUsername(username)
    result = api.LastJson
    username_id = result['user']['pk'] # Get user ID
    user_posts = api.getUserFeed(username_id) # Get user feed
    result = api.LastJson
    media_id = result['items'][0]['id'] # Get most recent post
    api.getMediaLikers(media_id) # Get users who liked
    users = api.LastJson['users']
    for user in users: # Push users to list
        users_list.append({'pk':user['pk'], 'username':user['username']})

Scrape Instagram Hashtag

People use hashtags to describe everything there is on their image post (mainly things and interests). It’s their way of increasing the reach of their post and getting more likes and followers. This sample is using Instaloader, a great python Project.

import threading
from instaloader import Instaloader, Profile
import engagement
import pickle
loader = Instaloader()
NUM_POSTS = 10
def get_hashtags_posts(query):
    posts = loader.get_hashtag_posts(query)
    users = {}
    count = 0
    for post in posts:
        profile = post.owner_profile
        if profile.username not in users:
            summary = engagement.get_summary(profile)
            users[profile.username] = summary
            count += 1
            print('{}: {}'.format(count, profile.username))
            if count == NUM_POSTS:
                break
    return users
if __name__ == "__main__":
    hashtag = "tacos"
    users = get_hashtags_posts(hashtag)
    print(users)

But this also means that we can easily see people that used a hashtag that’s relevant to our business. 

If I own a beard oil brand would I like to contact these 200 000 people that used #beardfashion in their posts? Absolutely! 

And there’s a hashtag for pretty much everything, so any niche business can find an audience that used relevant hashtags! You can check out the popularity of a hashtag with this tool

To Wrap Up

Hope this guide was useful for anyone looking to scrape data from Instagram. However, if you want to get data (any data from Instagram) without the nerve-wracking coding, book a call with me so we can find your ideal customers and get their data! Stay safe.

Get Data From Instagram

Email Addresses, phone numbers, bio & anything else you might find on Instagram

Leave a Reply

Hey, wait!

Before you go, check out a free data sample with the 20+ data points on every contact we have.