Data Analysis with Python
Contents
Data Analysis with Python¶
Unit 2 Machine Learning for Society and Culture¶
Background¶
In two units that build upon each other, we offer you a chance to develop your own data analysis and machine learning skills in a field called data analysis or also sometimes data analytics (https://en.wikipedia.org/wiki/Analytics).
Analytics has a strong tradition, very established methods, and a broad range of applications in culture, economy, and society. Particular relevant for us are the two traditions of social media and cultural analytics/analysis. Social media analytics (https://en.wikipedia.org/wiki/Social_media_analytics) extends general analytics toward the analysis of social (media) relationships. Cultural analytics (https://en.wikipedia.org/wiki/Cultural_analytics) builds on existing successful analytical methods and applies its techniques and methods to the study of cultural objects.
In the course, you will gain knowledge about the theoretical and practical foundations of an analysis of socio-cultural objects. While experimenting, you will also learn about the limits of the current capacities as well as the exciting opportunities.
Before you reach these opportunities, we will take you through the relevant foundations of Python in the first unit. This might seem abstract at times, but we promise you that we take you to the interesting parts as quickly as possible. The second unit presents digital methods from association mining to network analysis.
Learning objectives¶
After completing this course, the learner will:
Understand the key principles behind data analysis with Python for the humanities
Gain knowledge about key concepts behind data analysis with Python
Critically evaluate methods for data analysis and apply them to a wide range of fields in the humanities
Develop skills around critical dataset assessment
Develop research questions and strategies that can be addresses with data analysis
Write up reports on the outcomes of a data analysis
Before you get started¶
You will be working with Jupyter notebooks, brought together as a GitBook (https://jupyter.org). Jupyter is a web-based computing environment, often used with the Python programming language but other languages are also available. Jupyter notebooks contain both computer code and rich text elements including visualisations, maps or interactive elements. Content is shared within cells, which are the building blocks of notebooks for both code and text. Tbis introduction is the only notebook that does not contain code. Structured into cells, notebooks are human-readable as well as executable documents for data analysis. You can read a research story and interact with its execution and the methodologies at the same time. Sometimes, the coding cells will be already prepared for you. At other times, you will need to find your own solution.
If you feel uncertain about the environment, there are many video tutorial online like https://www.youtube.com/watch?v=H9Iu49E6Mxs.
The vidoes also explain how to install the environment on your own machine. You don’t need to do this as you can run the notebooks directly in either Binder (https://mybinder.org/) or Google Colab (https://colab.research.google.com/). There should be links on top of your notebooks that look like:
Please, note you might need an account for these environments. Once you are set up, everything should run smoothly.
But if you want to run the environment on your own machine, you can go to https://www.anaconda.com/ and follow the instructions to install Python and notebooks from https://www.anaconda.com/products/distribution. We would not recommend this for beginners though, as it requires some understanding of command lines, environments, etc. Check out https://docs.jupyter.org/en/latest/running.html. This how you download a file from GitHub: https://www.wikihow.com/Download-a-File-from-GitHub. At the beginning, it is better to focus on the Python programming. It’s hard enough.
Let’s get started …
Unit 2: Machine Learning for Society and Culture¶
The second unit of our series to introduce interactive data analysis with Python to humanities students will work through several research use cases using basic machine learning. We will, e.g., use association mining to analyse communities of practice. In the first session, we will explore political opinions in the US congress and how users share interests. We later on employ network analysis to split up a small community network into groups and clusters before finally learning more visualisation and image analysis.
Please, note that we require you to finish the first unit of this course before you can start this unit.
Before you start take a look at the overview below and how the various sessions work together.
Political Communities¶
We start exploring political and social community data using a popular exploration technique with association mining like clustering. It is very effective and easy to use. We will gain some real insights about US politics as well as other communities of practice.
Start the session from here.
Web Scraping¶
The web has become a unique source of data for community analysis. We can find lots of data on the web. A big problem with web data is, however, that it is often inconsistent and heterogeneous. To get access to it, one often has to visit multiple web sites and assemble their data together. Let’s discuss some ways and tools of doing so in this session on web scraping.
Start the session from here.
Graphs¶
Graphs from Data¶
In this session, we would like to start digging into the details of (social) network analysis using a famous example from anthropolgy. Zachary’s karate is an early example of network analysis.
In the first session, we learn how to transform datasets into networks.
Start the session from here.