Talks & Workshops

2021

  • Python for Linguists.
    69. Stuts + Tacos 2021 May 2021, Leipzig, Germany.
    Abstract Notebook
    Python is an easy to learn programming language with a ton of tools for linguistic research, from simple string manipulation and data cleaning, to more advanced parsers and complex statistical algorithms. We will go over the basics of Python's features, look at Spacy and Scikit-learn and finally go through an example project. Some basic programming knowledge is advisory, but not strictly necessary to follow along.
  • Best Practices in Ethical Data Collection.
    PyData Berlin New Year 2021 Meetup. January 2021, Berlin, Germany.
    Abstract YouTube Slides
    It seems to have become tradition to confront the ethics of data science mostly through reflection on controversy and righting past wrongs. While this might lead to greater awareness and continual grwoth in the right direction is certainly to be applauded, the question still lingers whether or not this approach is enough in a field that changes so rapidly from day to day and year to year. Framed in the major philosophical schools of ethics, from consequentialism to virtue ethics, this talk walks through the best practices in ethical data collection that not only lead to more ethical project management, but also guide institutions and individuals towards lasting and preemptive change in the face of a field without clear role models.

2020

  • Computational Approaches to Asterisk Correction Resolution.
    68. Stuts. November 2020, Berlin, Germany.
    Abstract
    The lack of an editing function in many instant messaging services gave rise to asterisk corrections, messages which only contain the correction intended for the original message, often preceded by an asterisk. This asterisk correction can take many forms, from simple substitutions to extensive clarifications, and can be approached from as many computational perspectives. I will present multiple of these approaches and try to define the task of asterisk correction resolution within a bridging resolution framework.
  • How Computer Algorithms Expose Our Hidden Biases (Revisted).
    6. PyData SüdWest. February 2020, Heidelberg, Germany.
    Abstract Slides
    In the advent of ever more complicated machine learning algorithms employed in systems used by millions of users every day, recent research has shown that the biases and stereotypes we put into these "black-boxes" is reflected in the finished product. From recommendation systems and image labelling to language classification and recidivism prediction, there is no easy solution to deal with, mostly subliminally, racist or sexist data. So far, solutions are, if available, only tailored to specific and clearly outlined domains and problems. Furthermore, one type of bias is not like the other, making it difficult to translate approaches for sexism to racism or vice versa. A basic understanding of implicit bias and how it shapes the data we use is key to predict and avoid the amplification of prejudice within an increasingly connected online world.

2019

  • (Lightning Talk) Your Algorithm is Probably Racist.
    PyCon DE & PyData Berlin. October 2019, Berlin, Germany.
    YouTube
  • Attacking Text - An Introduction to Adversarial Attacks in NLP.
    66. Stuts. November 2019, Munich, Germany.
    Abstract
    Adversarial examples have been making headlines in the computer vision community for a few years now, but did not seem to have a huge impact in natural language processing until very recently. Small changes to an image, mostly invisible to the human eye, can fool a neural network into classifying a turtle as a gun, or a stop sign as a green light. Of course, a single sentence has significantly fewer features to perturb than a 512x512 colour image, still machines can be fooled by slight rephrasing and exploiting real world biases that have crept into the system. This talk gives a brief introduction into the technology and dangers of adversarial attacks and delves into the possible implications for testing and employing natural language processing systems.
  • Graphemic Standardisation and Human Writing Systems.
    29. Tacos. June 2019, Saarbrücken, Germany.
    Abstract
    One of the oldest human inventions, writing has been around for a long while. For the longest time the only governing body of what could be written was one's own wrist and writing utensil. Starting with the printing press this changed as people had to agree what would be part of a character set, and what would not be. For computers this task has mostly been taken over by the Unicode standard, just one in a long string of international rulebooks for graphemic standardisation. But what makes a good international standard when it comes to writing systems? Is Unicode the be-all and end-all of what we can expect of alphabets in the digital age or is there still more to come? If we were to create a new standard, could Emojis of all places be the way forward? In this workshop we will dive deep into some very diverse alphabets, explore the cultural and historic significance of the Unicode standard and explore its advantages and shortcomings.

2018

  • How Computer Algorithms Expose Our Hidden Biases.
    64. Stuts. November 2018, Göttingen, Germany.
    Abstract
    In the advent of ever more complicated machine learning algorithms employed in systems used by millions of users every day, recent research has shown that the biases and stereotypes we put into these "black-boxes" is reflected in the finished product. From recommendation systems and image labelling to language classification and recidivism prediction, there is no easy solution to deal with, mostly subliminally, racist or sexist data. So far, solutions are, if available, only tailored to specific and clearly outlined domains and problems. Furthermore, one type of bias is not like the other, making it difficult to translate approaches for sexism to racism or vice versa. A basic understanding of implicit bias and how it shapes the data we use is key to predict and avoid the amplification of prejudice within an increasingly connected online world.