October 8, 2020

Instagram Scraper | How to Scrape All of Instagram’s Data

Instagram Scraper How to Scrape All of Instagram's Data

Contents

In this guide, you’ll find out how to scrape Instagram data like emails, phone numbers, biography, hashtags, or images by using a scraping tool, service or building your own scraper.

Here’s a summary of what we’ll cover:

  • Using an Instagram Scraping Tool
  • Using an Instagram Scraping Service
  • Building and Instagram Scraping Tool

Using an Instagram Scraping Tool

Instagram scraping tools are a great pick if you’re just experimenting with Instagram email scraping. But if you plan on scraping large amounts of data and emails, segmenting your prospects, and advertising to them via cold email or Facebook custom audiences, then you might want to consider using a Scraping Service. 

To understand why, we’ve listed pros and cons of Instagram scraping tools.

Pros and Cons of Instagram Scraping Tools

Pros:

  • Relatively easy to use
  • Cheap
  • Cloud Automated

Cons:

  • No targeting or segmentation
  • Fake Accounts and Bots
  • Invalid Emails, Spam Traps, Catch-all
  • Very limited data points

 

Very Limited Data Points

I’ve previously stated that you need gender, age, and location as the basis for any segmentation. These data points cannot be scraped directly from Instagram. However, there are workarounds to get to a certain level of segmentation.

For example, if you want to scrape people based only on the location you can search for local hashtags such as #New York, #Chicago, #San Francisco.

For getting the gender of a profile we use advanced AI. The AI checks every profile picture of a profile that was previously collected by our research team, detects the face that is most frequent in the images, and analyzes it until it spits out the gender. Currently, we’re at 8 out of 10 success rate which is not the case with age.

For an AI, age is much more difficult to determine since people do all sorts of stuff (like hair dyeing) to hide their true age. So far we can successfully determine age on 4 out of 10 Instagram profiles.

No targeting or segmentation

There’s this phrase “Data is the new oil”. Meaning that data is worth only after it gets refined (segmented, analyzed) and is ready to use – same as oil.

Most businesses target specific audiences and try to avoid people that don’t belong in these groups. Even if you scrape the followers of your main competitor (which is as targeted as you can get), there are still going to be a lot of people that don’t fit in your target market.

 This may be due to location, gender, age, etc.

Second, if the data is scraped for reaching out to people through email or mobile, you need to be able to segment people to personalize the message. And again, the most basic filters for data segmentation would be location, gender, age.

 

Fake Accounts & Bots

The fame of Instagram influencers gave rise to the fake followers’ industry. How?

Influencers are paid by the number of followers and their engagement stats; meaning that if these numbers go up, so does their paycheck. That’s why influencers of any size, be it mega, macro, micro or nano are all buying fake followers that will artificially boost their popularity.

One study found that one in ten Instagram accounts is a fake, rounding up to more than 95 million fake accounts currently active.

The problem in our case is that fakes are getting so good that it’s not that easy to distinguish fake from a real personal profile. And in recent years they started including fake emails and phone numbers in the bio or on the profile page.

This can ruin your scraping strategy and email campaign since Google will notice that you’ve contacted too many nonexistent emails and will put all your future emails in spam.

Fun fact: 45% of all “semi-famous” people from LA have, at some point, bought fake followers.

 Invalid Email, Spam Traps, Catch-All

Pretty much the same problem as with fake accounts. If you cannot detect invalid emails, spam traps, and catch-alls you will end up in spam.

This is why you must use email validation software like NeverBounce. However, make sure to check out their pricing since it can get really expensive if you do this at scale.

What are the best scraping tools?

There isn’t a single “best” scraping tool as they all have at least a few subtle differences like pricing, daily scraping volume, data points, etc. We’ve picked some of the most popular below, but you’ll have to decide for yourself which one is best for your use case.

  • Phantombuster
  • Brightdata
  • Socialscrape
  • Apify
  • Botster.io

Using an Instagram Scraping Service

Instagram scraping services are the safest and most reliable option for scraping Instagram data and emails. Most scraping services deliver on all the pros of scraping tools, but take the cons out of the equation. 

With scraping services, or as some call them data providers, you don’t need to do the scraping “yourself” and wait hours on a confusing dashboard for it to take place. They are powerhouses that have thousands of Instagram accounts and proxies and do Instagram scraping at an enormous scale.

They cater towards businesses that intend to use the data for large scale email marketing or Facebook advertising (via custom audiences). Most of the time this includes Ecommerce, SaaS, Creator Economy brands, and even Marketing Agencies.

Pros:

  • Easy to Use
  • Instant Delivery
  • Comprehensive Targeting and Segmentation
  • Unlimited Scraping Volume
  • Cleaned from Fake Accounts and Bots
  • Cleaned from Invalid Emails, Spam Traps, Catch-all
  • Enriched with Many Data Points

Cons:

  • Slightly more expensive than scraping tools

What is the best Instagram scraping service?

Here at Influencers Club, we own the largest database of Instagram data and emails on the entire planet. We’ve been scraping day and night for the past 2 years, and have helped more than 600 big time clients grow their businesses through our data (don’t take our word for it, check out what they had to say on the customers page).

We provide the most comprehensive data on the market with more than 30 data points that can be layered as filters to achieve laser-targeted email lists. These data points are also included with each email on our lists. They include:

  • Age
  • Gender
  • Location
  • Interests
  • Ethnicity
  • Engagement Rate
  • Follower Count
  • 20+ more

Simply share your targeting criteria in the form below, or on a free strategy call, and you’ll receive the most targeted list of Instagram emails possible.

Building an Instagram Scraper Tool

*** Important note: Please be advised that automatically accessing Instagram is against their terms of service.

This is just an educational resource made for developers. For those of you who want to be 100% on the legal side of things, get in touch with us and just buy contact data from vetted Instagram users and creators curated by real people and enriched by APIs. 

1. How to Scrape Instagram Data Using Python

To access the unofficial Instagram API we use mobile endpoints through Python, PHP, or really anything that can log in to Instagram accounts and scrape the data. It’s only a few lines of Python code (see example below) but there is so much more to it. Here is how you can log in to a profile to use the unofficial Instagram API.

All code samples are taken from Github public repos, like the one listed here.

def login(self, force=False):
        """
            Authenticate this API instance.
            If already logged in (and not later logged out) does nothing (unless forced).
        :param force: if true, will attempt to log in even if already logged in.
        :return: dictionary of responses.
        """
        if not self._isloggedin or force:
            self._session = requests.Session()
            # if you need proxy make somethin://proxyip:proxyport"}
            full_response = self._sendrequest(
                'si/fetch_headers/?challenge_type=signup&guid=' + self.generate_uuid(False), login=True)
            data = {
                'phone_id': self.generate_uuid(True),
                '_csrftoken': full_response.cookies['csrftoken'],
                'username': self._username,
                'guid': self._uuid,
                'device_id': self._deviceid,
                'password': self._password,
                'login_attempt_count': '0'}
            try:
                full_response = self._sendrequest(
                    'accounts/login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            except InstagramAPIBase._2FA_Required as exception:
                # In order to login, need to provide the second factor (i.e. SMS code or backup code).
                # Use call-back to get this string.
                if not self._two_factor_callback:
                    raise AuthenticationError("This account requires support for Two-Factor Authentication")
                two_factor_info = exception.two_factor_info = exception.two_factor_info
                verification_string = self._two_factor_callback(two_factor_info)
                data = {
                    'verification_code': verification_string,
                    'two_factor_identifier': g like this:
            # self.s.proxies = {"https": "httptwo_factor_info['two_factor_identifier'],
                    '_csrftoken': full_response.cookies['csrftoken'],
                    'username': self._username,
                    'device_id': self._deviceid,
                    'password': self._password,
                }
                full_response = self._sendrequest(
                    'accounts/two_factor_login/',
                    post=self._generatesignature(json.dumps(data)),
                    login=True)
            self._isloggedin = True
            decoded_text = json.loads(full_response.text)
            self._loggedinuserid = decoded_text["logged_in_user"]["pk"]
            self._ranktoken = "%s_%s" % (self._loggedinuserid, self._uuid)
            self._csrftoken = full_response.cookies["csrftoken"]
            return decoded_text

Now, there is a lot more going on in the background than a simple login request, that part is easy. You have to use a proxy (preferably from the location you are already at, make sure to complete any captchas if necessary, and then let the profile rest for a few days before you begin scraping because Instagram is now “watching you”.

The login request is probably the most problematic one because it’s triggering Instagram’s algorithm to keep an eye out for you, so please be careful.

3. Instagram Profiles for Scraping the Data

Maybe you are only starting out with email / sms marketing and don’t need a huge amount of data. Good for you because the safest way to scrape data is to do it by hand. Just visit profiles and copy-paste their info. It is safe because Instagram cannot detect any violations since it’s something that everyone does, peeking into people’s profiles.

You can do it automatically of course, even though if you work at a serious company we wouldn’t recommend this.

To simulate human behavior on Instagram’s mobile app, while extracting every available data point of the profile. The reason you need to do this is because Instagram has a VERY limited API call limit. Meaning that the scraping capacity of a single profile is really low and If we don’t run with 1 500 accounts, the crawling takes forever.

Now if you want to buy Instagram profiles and scrape yourself, there are two important things to note:

  1. ALWAYS use aged Instagram profiles and have phone numbers that you can validate them with otherwise they’ll be banned almost immediately

  2. NEVER use your personal profile

You can purchase Instagram profiles from;

  • Facebook pages

  • Instagram direct messages

  • and even on dedicated online marketplaces across the world

But even if you manage to buy and login with all those profiles, you’ll still face many challenges. Instagram is pretty smart nowadays and can recognize profiles that are bought from the so-called gray market. However, few sellers are really good at creating these fakes that are hard to detect. To find one of these pros just search for the most expensive sellers on the market.

4. Proxies for Remaining Undetected

A proxy is a third party server that allows you to route your request through their servers and use their IP address in the process. When using a proxy, Instagram no longer sees your IP address but the IP address of the proxy, giving you the ability to scrape from one server. But remember no to simulate too many IPs because logging in more than 5 profiles on the same IP is a huge no-no..

Just as with the Instagram profiles, the same problem occurs with proxies. Instagram detects thousands of proxy providers and until you find a good one you’re in a lot of trouble. Because if Instagram bans the proxy you use, that automatically means that the associated Instagram profile is also no longer available. To check if you are safe and your proxy provider is still not on the radar, then use this website to paste your proxies IP. If it’s a known provider it will be there and since this website knows, trust me the all-seeing Zuckerberg eye knows too.

The Pros and Cons of Building Instagram Data Scraper

The benefits of having an Instagram scraper inhouse are:

  • Full control of the whole process

  • The contact data you acquire can be resold or rented

  • You can use the data to scale your business

However, there are also some serious drawbacks:

  • No targeting or segmentation

  • In clear violation of Instagram’s ToS

  • Fake Accounts and Bots

  • Invalid Emails, Spam Traps, Catch-all

  • Security Risks

  • Very limited data points

(We’ll focus on the ones we didn’t cover earlier)

Security Risks

 

The number one weapon Instagram uses to detect bot activity is its API call limit. If you go over their limit you instantly get blocked even if your account is aged.

Buying Scraped Instagram Data

I would say that developing an Instagram scraper tool is only what indie developers would do. For startups or bigger companies, it’s just not feasible to go over this entire process and scrape at a level that won’t bring the much-needed ROI + no one should risk a lawsuit by Instagram.

An easier solution is to just buy vetted data from Instagram creators (like emails and phone numbers) and immediately use it for marketing purposes.

Email marketing is the most profitable channel out there. With our help, 200+ businesses have entirely cut out their ad spend on Google and Facebook Ads and relied solely on the email lists we provide.

Making email marketing the only channel for acquiring new customers is something that was not possible before. But Instagram is a platform where people share a lot of information about their life. Our secret sauce is that we’ve learned not only to manually “scrape” people’s data but to also understand them. This allowed us to offer businesses contact data of people that are very similar to their ideal customers. And with a database of millions, we can do that at an unprecedented scale.

If you’re interested in getting email lists of people that are from your niche, book a free strategy call with us here.

How to Create an Instagram Email Scraper (Github Repo)

Use this Github Repo for the code samples!

**Note: All of these code samples are working while we were writing the blog post. We do not guarantee that they will work forever because as we said earlier in the blog post, they always make changes to their code and you need a dedicated developer if you want to execute this correctly. 

Once you are logged into with an Instagram account and from a specific proxy, getting data should be “easy enough”. You only need the API endpoints.

The one for email is: /api/v1/users/{{user_id}}/info/.

User.public_email Email address
user.username The Username
user.is_private If this is a private account
user.full_name User’s full name
user.profile_pic_url User’s profile photo URL
user.biography User’s bio
user.external_url User’s website
user.follower_count Follower count
user.following_count Following count
user.media_count Number of posts

These are all the data points you’ll be getting with that API call.
Check out how you can boost the ROI of your email marketing by 700% in this Instagram email scraper article.

Developing an Instagram Phone Number Scraper or Extractor

The Instagram Phone Number Extractor comes believe it or not with the same API call as the email. This is if they have it publicly available.

/api/v1/users/{{user_id}}/info/

So other than the data points mentioned above, you’ll also have the phone number.

You should know that only 10-15% of all Instagram users share their personal phone numbers publicly. Although the percentage is relatively small, let’s not forget that 15 percent of all Instagram users are roughly 150 million.

Scrape Comments From Instagram

One thing I really hate about scraping comments from Instagram is that you get a ton of people that do automated comments and/or engagement groups. So make sure to watch out for those. Anyhow, I feel generous so let me tell you how to scrape comments with the:

  1. Mobile API (Like the Instagram phone number extractor and email scraper codes above)
  2. Through the Web API (limited but super fast and easy to do)

For 1) Use /api/v1/media/{{post_id}}/comments/. You’ll get all of them assuming that you have the post ID. It’s easy to find that if you’ve already scraped the posts or just open the photo via the browser and copy the id which is here:

As for 2) take a look at the code sample below. That should get you from 0 to scraping comments (unless Instagram makes some changes to their code and they always do!)

try:
    close_button = driver.find_element_by_class_name('xqRnw')
    close_button.click()
except:
    pass
try:
    load_more_comment = driver.find_element_by_css_selector('.MGdpg > button:nth-child(1)')
    print("Found {}".format(str(load_more_comment)))
    i = 0
    while load_more_comment.is_displayed() and i < int(sys.argv[2]):
        load_more_comment.click()
        time.sleep(1.5)
        load_more_comment = driver.find_element_by_css_selector('.MGdpg > button:nth-child(1)')
        print("Found {}".format(str(load_more_comment)))
        i += 1
except Exception as e:
    print(e)
    pass
user_names = []
user_comments = []
comment = driver.find_elements_by_class_name('gElp9 ')
for c in comment:
    container = c.find_element_by_class_name('C4VMK')
    name = container.find_element_by_class_name('_6lAjh').text
    content = container.find_element_by_tag_name('span').text
    content = content.replace('n', ' ').strip().rstrip()
    user_names.append(name)
    user_comments.append(content)
user_names.pop(0)
user_comments.pop(0)
import excel_exporter
excel_exporter.export(user_names, user_comments)
driver.close()

Instagram Image/Photo Scraper

I need to mention here that scraping photos/pictures from Instagram is hard to pull off since you need to get them from the web. Here’s a Git script that might be helpful though (we’ve never used it so can’t guarantee):

Instagram Image Scraper – Code

Export Instagram Likes

Although it is not possible to simply export likes from Instagram, you can crawl and provide them in CSV files. Here’s a code snippet you can use:

def get_likes_list(username):
    api.login()
    api.searchUsername(username)
    result = api.LastJson
    username_id = result['user']['pk'] # Get user ID
    user_posts = api.getUserFeed(username_id) # Get user feed
    result = api.LastJson
    media_id = result['items'][0]['id'] # Get most recent post
    api.getMediaLikers(media_id) # Get users who liked
    users = api.LastJson['users']
    for user in users: # Push users to list
        users_list.append({'pk':user['pk'], 'username':user['username']})

Scrape Instagram Hashtags

People use hashtags to describe everything there is on their image post (mainly things and interests). It’s their way of increasing the reach of their post and getting more likes and followers. This Instagram hashtag scraper is using Instaloader, a great python project.

import threading
from instaloader import Instaloader, Profile
import engagement
import pickle
loader = Instaloader()
NUM_POSTS = 10
def get_hashtags_posts(query):
    posts = loader.get_hashtag_posts(query)
    users = {}
    count = 0
    for post in posts:
        profile = post.owner_profile
        if profile.username not in users:
            summary = engagement.get_summary(profile)
            users[profile.username] = summary
            count += 1
            print('{}: {}'.format(count, profile.username))
            if count == NUM_POSTS:
                break
    return users
if __name__ == "__main__":
    hashtag = "tacos"
    users = get_hashtags_posts(hashtag)
    print(users)

But this also means that we can easily see people that used a hashtag that’s relevant to our business.

If I own a beard oil brand would I like to contact these 200 000 people that used #beardfashion in their posts? Absolutely!

And there’s a hashtag for pretty much everything, so any niche business can find an audience that used relevant hashtags! You can check out the popularity of a hashtag with keywordtool.io.

Skip Building an Instagram Data Scraper

Hope this guide was useful for anyone looking to scrape data from Instagram. However, if you want to get data (any data from Instagram) without the nerve-wracking coding, and in a 100% compliment way book a call with me so we can find your ideal customers and get their data! Stay safe.

GET INSTAGRAM EMAILS

If you are interested in learning more about how you can get validated Instagram data, make sure to schedule a quick call with our team, where you can discuss the details related to your targeting.

red-thunder-icon
FREE STUFF!

15 Creator Economy Pitch Decks That Helped Raise $247M

red-thunder-icon
Related Posts