最新消息:

中国互联网“水军”大揭秘-Undercover Researchers Expose Chinese Internet Water Army

新闻 admin 8872浏览

下有译文
In China, paid posters are known as the Internet Water Army because they are ready and willing to ‘flood’ the internet for whoever is willing to pay. The flood can consist of comments, gossip and information (or disinformation) and there seems to be plenty of demand for this army’s services.

This is an insidious tide. Positive recommendations can make a huge difference to a product’s sales but can equally drive a competitor out of the market. When companies spend millions launching new goods and services, it’s easy to understand why they might want to use every tool at their disposal to achieve success.

The loser in all this is the consumer who is conned into making a purchase decision based on false premises. And for the moment, consumers have little legal redress or even ways to spot the practice.

Today, Cheng Chen at the University of Victoria in Canada and a few pals describe how Cheng worked undercover as a paid poster on Chinese websites to understand how the Internet Water Army works. He and his friends then used what he learnt to create software that can spot paid posters automatically.

Paid posting is a well-managed activity involving thousands of individuals and tens of thousands of different online IDs. The posters are usually given a task to register on a website and then to start generating content in the form of posts, articles, links to websites and videos, even carrying out Q&A sessions.

Often, this content is pre-prepared or the posters receive detailed instructions on the type of things they can say. And there is even a quality control team who check that the posts meet a certain ‘quality’ threshold. A post would not be validated if it is deleted by the host or was composed of garbled words, for example.

Having worked undercover to find out how the system worked, Cheng and co then studied the pattern of posts that appeared on a couple of big Chinese websites: Sina.com and Sohu.com. In particular, they studied the comments on several news stories about two companies that they suspected of paying posters and who were involved in a public spat over each other’s services.

The Sina dataset consisted of over 500 users making more than 20,000 comments; the Sohu dataset involved over 200 users and more than 1000 comments.

Cheng and co went through all the posts manually identifying those they believed were from paid posters and then set about looking for patterns in their behaviour that can differentiate them from legitimate users. (Just how accurate were there initial impressions is a potential problem, they admit, but the same one that spam filters also have to deal with.)

They discovered that paid posters tend to post more new comments than replies to other comments. They also post more often with 50 per cent of them posting every 2.5 minutes on average. They also move on from a discussion more quickly than legitimate users, discarding their IDs and never using them again.

What’s more, the content they post is measurably different. These workers are paid by the volume and so often take shortcuts, cutting and pasting the same content many times. This would normally invalidate their posts but only if it is spotted by the quality control team.

So Cheng and co built some software to look for repetitions and similarities in messages as well as the other behaviours they’d identified. They then tested it on the dataset they’d downloaded from Sina and Sohu and found it to be remarkably good, with an accuracy of 88 per cent in spotting paid posters. “Our test results with real-world datasets show a very
promising performance,” they say.

That’s an impressive piece of work and a good first step towards combating this problem, although they’ll need to test it on a much wider range of datasets. Nevertheless, these guys have the basis of a software package that will weed out a significant fraction of paid posters, provided these people conform to the stereotype that Cheng and co have measured.

And therein lies the rub. As soon as the first version of the software hits the market, paid posters will learn to modify their behaviour in a way that games the system. What Cheng and co have started is a cat and mouse game just like those that plague the antivirus and spam filtering industries.

And that means, the battle ahead with the Internet Water Army will be long and hard.

译文:

中国的付费写手一般称为互联网水军,他们乐于为付费人士在互联网上灌水。灌水包括评论,传播流言或者信息。目前看来,水军服务市场需求较大。

积极推荐可以为产品销售带来很大不同,但同时也将竞争者赶出市场。企业在产品及服务方面耗资巨大,这就不难理解为什么他们会用尽一切方式获取成功。然而,依赖这些假信息做出购买决定的顾客成为最大输家。

加拿大维多利亚大学的Cheng Chen与几个伙伴共同描述了他的互联网水军经历,通过秘密从事这项工作,他们了解到互联网水军的工作方式,然后,他跟朋友一起研发了软件,来自动识别这些付费写手。

付费发帖是一项控制活动,其中可能涉及上千人、上万的不同ID。通常,要求这些发帖人在网站上面进行注册,然后他们开始在网站上面发帖,发文章,发布网站或视频连接,甚至包括问答部分。对发帖人的发言内容会有明确要求,有时还会有专业的质量控制团队,来保证帖子内容达到一定的标准。

明白这些系统如何运作之后,Cheng和朋友开始研究新浪、搜狐网站上出现的帖子。新浪的500名用户一共有20000多条评论;搜狐的200名用户做出1000条评论。

Cheng和朋友一起研究所有帖子,试图发现这些付费写手与普通用户的不同。

他们发现付费写手更倾向于发表新的评论,而不是对其它评论进行回复。这些写手中有50%的人会每2.5分钟就发一个新帖,与普通用户相比,他们会更快地从讨论中抽身而出,并将ID弃而不用。

而且,他们每个人发布的内容也有很大不同,这些人大多通过捷径、剪贴,反复发表相同内容。

Cheng跟朋友研发了专门的软件,查找信息中重复、相同的内容,同时包括他们发现的其它一些行为。在新浪、搜狐网站上的测试准确度达到了88%。

这项工作是进行斗争的第一步,接下来,他们还要在更广的范围内进行测试。也就是说,今后与互联网水军的较量会是一场硬仗。

本文原文摘自:technologyreview 译文摘自:tech2ipo

转载请注明:百尚 » 中国互联网“水军”大揭秘-Undercover Researchers Expose Chinese Internet Water Army