Korea’s data protection watchdog recently imposed a hefty penalty on a startup for leaking a massive amount of personal information in the process of developing and commercializing a controversial female chatbot. The Personal Information Protection Commission (PIPC) accused Scatter Lab, a Seoul-based startup for illegally using personal information of its clients in the development and operation of an artificial intelligence-driven chatbot service called “Lee Luda.” Scatter Lab was ordered to pay 103.3 million won (Korean currency), which is approximately $92,900 in penalties — a penalty surcharge of 55.5 million won and an administrative fine of 47.8 million won. It is the first time in Korea that the government has sanctioned the indiscriminate use of personal information by companies using AI technology.
Recent major amendments to three South Korean data privacy laws and their implications
On 9 January 2020, South Korea’s National Assembly passed amendments (the ‘Amendments’) to the three major data privacy laws: the Personal Information Protection Act (PIPA); the Act on the Promotion of Information and Communications Network Utilisation and Information Protection (‘Network Act’); and the Act on the Use and Protection of Credit Information (‘Credit Information Act’). The Amendments came into force on 5 August 2020, except for certain provisions in the Credit Information Act which will come into effect in a year to 18 months after its promulgation.
The Amendments largely aim to:
• Minimise the burden of redundant regulatory activities and confusion among regulated persons stemming from previously overlapping data privacy regulations and multiple supervisory bodies; and
• Develop a ‘data economy’ by introducing the concept of ‘pseudonymised data’ and legal basis on which data may be used in a more flexible way to an extent reasonably related to the original purpose of collection.
AI ChatBoT – Technical Details
The company illegally harvested data from 9.4 billion conversations conducted by 600,000 users of its other apps, “Science of Love” and “Text At”. The Science of Love app focused on conversations between romantic partners to predict a partner’s true levels of affection. Those insecure about their mates could pay equivalent US$4.50 to upload their KakaoTalk messenger logs to Science of Love and be reassured of (or disappointed by) their partner’s level of love. In addition, the Lee Luda chatbot was programmed to select and speak one of about 100 million KakaoTalk conversation sentences from women in their 20s, the PIPC said.
The natural tone of Lee Luda was possible as ScatterLab collected “10 billion real-life conversations between young couples taken from KakaoTalk”, which is the most popular message application in South Korea (McCurry 2021). ScatterLab did not directly collect conversations from KakaoTalk, but took a roundabout way; in other words, in a sneaky way. There have been few counselling service applications which analyse messenger conversations and give advice about love life when the users agree to submit their KakaoTalk conversations to the apps. ScatterLab obtained data from those applications very easily.
Sanction on Artificial Intelligence and Machine Learning Development
Scatter Lab is accused of using about 600,000 people’s 9.4 billion KakaoTalk conversations collected from its emotional analysis apps Science of Love and Text At in the process of developing and operating the Lee Luda chatbot service without obtaining their prior consent. The company is also accused of collecting personal information of about 200,000 children under the age of 14 without obtaining the consent of their parents or guardians in the development and operation process for its services. Scatter Lab did not set any age limit in recruiting subscribers for its app services and collected 48,000 children’s personal information through Text At, 120,000 children’s information from Science of Love and 39,000 children’s information from Lee Luda. The company was also criticized for failing to delete or encode the app users’ names, mobile phone numbers and personal addresses before using them in the development of its AI chatbot learning algorithms.
Impact on Business
The app was a hit. By summer 2020 Science of Love had been downloaded 7.5 million times in South Korean and Japan and Scatter Lab planned to start capitalizing on relationship insecurity in the US. However, expansion came to a screeching halt when Scatter Lab added a chatbot service named “Lee-Luda”, marketed as a 20-something AI friend available for those who’d given up on human interaction. The Lee Luda chatbot service attracted more than 750,000 users in just three weeks after its launch on Dec. 23, but Scatter Lab suspended the Facebook-based service the following month amid complaints over its discriminatory and offensive language against sexual minorities since Lee-Luda had a proclivity towards lewd and homophobic speech, and she also leaked personal data. Furthermore, Lee Luda’s training data was uploaded to GitHub thus exposing names, locations, relationship status and even some medical information. Lee-Luda was shut down and Science of Love was slammed with damming Google Play reviews.
“This case is meaningful in that companies are not allowed to use personal information collected for specific services indiscriminately for other services without obtaining explicit consent from the concerned people,” PIPC Chairman Yoon Jong-in said, as we anticipate South Korean companies’ entry into the EU market. Such cases become highlights from a business point of view as this will facilitate a transfer of personal data between South Korea and EU Member States easily. In this regard, companies will need to verify in advance whether they are subject to the EU’s General Data Protection Regulation (GDPR) and, if so, ensure compliance with its legal requirements to reduce legal risk.
Data Ethics and Preventive Controls
It is very common to see the users of a certain internet service are indifferent to the usage of their personal data, although they have the rights to the data. They must agree to terms of services — which states that their personal data will be collected and shared — otherwise they will not be able to use the service. Yet, they are often not aware of the terms as they simply do not read the screed or do not understand the legal terms. They would implicitly know that their personal information will be revealed or used somewhere and sometime, but they would not know the exact usage or extent of disclosure. The best way to prevent data leakage or misuse would be that individuals need to understand what kind of data they are sharing, who they are sharing with, and where the data will be used.
In addition to this, the data collectors often overlook data ethics that they need to collect and handle the data with caution. Obviously, the lack of control on the usage of data can produce negative outcomes. Thus, the data collectors must specify what kind of data they will be collecting from data providers and how they will be used. They also should have a sense that the data providers gave the right to use their data, thus the data cannot be transferred to others without agreement, and the data should be treated carefully. Furthermore, there must be legal and technical mechanisms which protect data providers’ privacy and prevent data collectors from breaching laws.
In sum, keeping data safe is not just a matter of one certain group of people, but it is a matter of everyone. By understanding how personal data should be shared, how the data one shared can be used, and what steps are needed to protect the data, we can protect our personal information and will be able to make good use of advanced technology without being counterattacked. Both data providers and data collectors need to be responsible for the data they create, provide, collect, and use by keeping in mind that AI is built upon big data.