Data Analysis with Python

Unit 2 Machine Learning for Society and Culture

Background

In two units that build upon each other, we offer you a chance to develop your own data analysis and machine learning skills in a field called data analysis or also sometimes data analytics (https://en.wikipedia.org/wiki/Analytics).

Analytics has a strong tradition, very established methods, and a broad range of applications in culture, economy, and society. Particular relevant for us are the two traditions of social media and cultural analytics/analysis. Social media analytics (https://en.wikipedia.org/wiki/Social_media_analytics) extends general analytics toward the analysis of social (media) relationships. Cultural analytics (https://en.wikipedia.org/wiki/Cultural_analytics) builds on existing successful analytical methods and applies its techniques and methods to the study of cultural objects.

In the course, you will gain knowledge about the theoretical and practical foundations of an analysis of socio-cultural objects. While experimenting, you will also learn about the limits of the current capacities as well as the exciting opportunities.

Before you reach these opportunities, we will take you through the relevant foundations of Python in the first unit. This might seem abstract at times, but we promise you that we take you to the interesting parts as quickly as possible. The second unit presents digital methods from association mining to network analysis.

Learning objectives

After completing this course, the learner will:

  • Understand the key principles behind data analysis with Python for the humanities

  • Gain knowledge about key concepts behind data analysis with Python

  • Critically evaluate methods for data analysis and apply them to a wide range of fields in the humanities

  • Develop skills around critical dataset assessment

  • Develop research questions and strategies that can be addresses with data analysis

  • Write up reports on the outcomes of a data analysis

Before you get started

You will be working with Jupyter notebooks, brought together as a GitBook (https://jupyter.org). Jupyter is a web-based computing environment, often used with the Python programming language but other languages are also available. Jupyter notebooks contain both computer code and rich text elements including visualisations, maps or interactive elements. Content is shared within cells, which are the building blocks of notebooks for both code and text. Tbis introduction is the only notebook that does not contain code. Structured into cells, notebooks are human-readable as well as executable documents for data analysis. You can read a research story and interact with its execution and the methodologies at the same time. Sometimes, the coding cells will be already prepared for you. At other times, you will need to find your own solution.

If you feel uncertain about the environment, there are many video tutorial online like https://www.youtube.com/watch?v=H9Iu49E6Mxs.

The vidoes also explain how to install the environment on your own machine. You don’t need to do this as you can run the notebooks directly in either Binder (https://mybinder.org/) or Google Colab (https://colab.research.google.com/). There should be links on top of your notebooks that look like:

title

Please, note you might need an account for these environments. Once you are set up, everything should run smoothly.

But if you want to run the environment on your own machine, you can go to https://www.anaconda.com/ and follow the instructions to install Python and notebooks from https://www.anaconda.com/products/distribution. We would not recommend this for beginners though, as it requires some understanding of command lines, environments, etc. Check out https://docs.jupyter.org/en/latest/running.html. This how you download a file from GitHub: https://www.wikihow.com/Download-a-File-from-GitHub. At the beginning, it is better to focus on the Python programming. It’s hard enough.

Let’s get started …

Unit 2: Machine Learning for Society and Culture

The second unit of our series to introduce interactive data analysis with Python to humanities students will work through several research use cases using basic machine learning. We will, e.g., use association mining to analyse communities of practice. In the first session, we will explore political opinions in the US congress and how users share interests. We later on employ network analysis to split up a small community network into groups and clusters before finally learning more visualisation and image analysis.

Please, note that we require you to finish the first unit of this course before you can start this unit.

Before you start take a look at the overview below and how the various sessions work together.

Political Communities

We start exploring political and social community data using a popular exploration technique with association mining like clustering. It is very effective and easy to use. We will gain some real insights about US politics as well as other communities of practice.

Start the session from here.

Web Scraping

The web has become a unique source of data for community analysis. We can find lots of data on the web. A big problem with web data is, however, that it is often inconsistent and heterogeneous. To get access to it, one often has to visit multiple web sites and assemble their data together. Let’s discuss some ways and tools of doing so in this session on web scraping.

Start the session from here.

Graphs

Graphs from Data

In this session, we would like to start digging into the details of (social) network analysis using a famous example from anthropolgy. Zachary’s karate is an early example of network analysis.

In the first session, we learn how to transform datasets into networks.

Start the session from here.

Graphs from Data

In the second session, we learn how to visualise and analyse graphs.

Start the session from here.

Visuals and Arts

Advanced Visualisation

Python provides more advanced tools for data visualisations, which we will learn about in this session. Data visualisation is a powerful way of exploring datasets.

Start the session from here.

Image Analysis

Up to now we have mainly used this unit to introduce how to produce visuals of data. This is important but there is also the analysis of images and other multimedia. In this session, we talk therefore about a few simple exercises to work with images.

Start the session from here.