Working with GA analytics from Python API

To analyse data from your website: users, sessions, page views, many would choose Google Analytics – GA (https://analytics.google.com/). To obtain data you can go to the GA website, choose metrics and time frame you need and download data to work with in csv or xslx format. However, there is also an API to directly import data from GA to data-analysing script (e.g. within Jupyter notebook), so why not explore it instead?

How to connect to GA API and use data from it? Great step-by-step instructions are available on GA API websites (https://developers.google.com/sheets/api/quickstart/python) and I will just combine all of needed info in one place.

First of all:

  1. Go to this page: https://console.developers.google.com/flows/enableapi?apiid=sheets.googleapis.com and create a project and (automatically) turn on the API. Click Continue, then click Go to credentials.
  2. On the Add credentials to your projectpage, click the Cancel
  3. At the top of the page, select the OAuth consent screen Select an Email address, enter a Product name if not already set, and click the Save button.
  4. Select the Credentials tab, click the Create credentials button and select OAuth client ID.
  5. Select the application type Other, enter the name and click the Create.
  6. Dismiss the resulting dialog.
  7. Click the file_download(Download JSON) button to the right of the client ID.
  8. Move this file to your working directory and rename it to client_secrets.json
  9. on your computer install google-api-python-client
    pip install --upgrade google-api-python-client

    or

  10. sudo easy_install --upgrade google-api-python-client

and you’re good to go.

Now how to connect to GA and send your first request:

(the example is based on https://github.com/EchoFUN/GAreader/blob/master/hello_analytics_api_v3.py)

from googleapiclient.errors import HttpError
from googleapiclient import sample_tools
from oauth2client.client import AccessTokenRefreshError

def ga_request(input_dict):
    service, flags = sample_tools.init(
    [], 'analytics', 'v3', __doc__, __file__,
    scope='https://www.googleapis.com/auth/analytics.readonly')

    try:
        first_profile_id = get_first_profile_id(service)
        if not first_profile_id:
            print('Could not find a valid profile for this user.')
        else:
            results = get_top_keywords(service, first_profile_id, input_dict)
            return results

    except TypeError as error:
        print(('There was an error in constructing your query : %s' % error))

    except HttpError as error:
        print(('Arg, there was an API error : %s : %s' %
              (error.resp.status, error._get_reason())))

    except AccessTokenRefreshError:
        print('The credentials have been revoked or expired, please re-run '
          'the application to re-authorize')

def get_first_profile_id(service):
    accounts = service.management().accounts().list().execute()
    if accounts.get('items'):
        firstAccountId = accounts.get('items')[0].get('id')
        webproperties = service.management().webproperties().list(
            accountId=firstAccountId).execute()

        if webproperties.get('items'):
            firstWebpropertyId = webproperties.get('items')[0].get('id')
            profiles = service.management().profiles().list(
                accountId=firstAccountId,
                webPropertyId=firstWebpropertyId).execute()

            if profiles.get('items'):
                return profiles.get('items')[0].get('id')

    return None

def get_top_keywords(service, profile_id, input_dict):
    if input_dict['filters'] == '':
        return service.data().ga().get(
            ids=input_dict['ids'],
            start_date=input_dict['start_date'],
            end_date=input_dict['end_date'],
            metrics=input_dict['metrics'],
            dimensions=input_dict['dimensions']).execute() 
    return service.data().ga().get(
        ids=input_dict['ids'],
        start_date=input_dict['start_date'],
        end_date=input_dict['end_date'], 
        metrics=input_dict['metrics'],
        filters = input_dict['filters'],
        dimensions=input_dict['dimensions']).execute()

Save this file as ga_api_example.py

You need to remember that from API, metrics and other feature names may have other exact names, e.g. custom dimensions are just called dimension[no] etc.

And now, after loading your file

from ga_api_example import ga_request

You can prepare request

request = {
"ids" : "ga:<your_id>",
"start_date" : "2017-06-25",
"end_date" : "2017-06-25",
"metrics" : "ga:pageviews",
 "filters" : "ga:dimension1=~yes",
"dimensions" : ""
}

=~ means use regex as at GA website reports

and execute it:

data = ga_request(request)

Now you have data in Python script where you work with, e.g. with pandas.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s